IDOL KeyView Filter SDK 12.12 Java Programming Guide

Micro Focus

IDOL KeyView Filter SDK 12.12 Java ...

PDF preview unavailable. Download the PDF instead.

KeyViewFilterSDK 12.12 JavaProgramming
IDOL
KeyView
Software Version 12.12
Filter SDK Java Programming Guide
Document Release Date: June 2022 Software Release Date: June 2022

Filter SDK Java Programming Guide
Legal notices
© Copyright 2016-2022 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors ("Micro Focus") are as may be set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice.
Documentation updates
The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software.
To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/.
Support
Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to:
l View information about all services that Support offers l Submit and track service requests l Contact customer support l Search for knowledge documents of interest l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts
Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in.

IDOL KeyView (12.12)

Page 2 of 280

Filter SDK Java Programming Guide
Contents
Part I: Overview of Filter SDK
Chapter 1: Introducing Filter SDK
Overview Features Platforms, Compilers, and Dependencies
Supported Platforms Supported Compilers Software Dependencies Windows Installation UNIX Installation Package Contents License Information Enable Advanced Document Readers Pass License Information to KeyView Directory Structure
Chapter 2: Getting Started
Architectural Overview File Caching Filtering Subfile Extraction Use the Java Implementation of the API
Input/Output Operations Filter in File or Stream Mode Multithreaded Filtering Before Running Your Application The Filter Process Model Filter API File Extraction API Persist the Child Process
In the API In the formats.ini File Run Filter In Process In the API
IDOL KeyView (12.12)

11
12
12 12 13 13 14 14 15 16 16 17 17 18 19
20
20 21 22 22 22 23 23 24 25 25 25 26 26 26 26 27 27
Page 3 of 280

Filter SDK Java Programming Guide

In the formats.ini File Run File Extraction Functions Out of Process
Restart the File Extraction Server Out-of-Process Logging
Enable Out-of-Process Logging Set the Verbosity Level Enable Windows Minidump Keep Log Files Run File Detection In or Out of Process Specify the Process Type In the formats.ini File Specify the Process Type In the API Stream Data to Filter
Part II: Use Filter SDK
Chapter 3: Use the File Extraction API
Introduction Extract Subfiles
Sanitize Absolute Paths Extract Images Recreate a File Hierarchy
Create a Root Node Example Extract Mail Metadata Default Metadata Set
Extract the Default Metadata Set Extract All Metadata Microsoft Outlook (MSG) Metadata
Extract MSG-Specific Metadata Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata
Extract EML- or MBX-Specific Metadata Lotus Notes Database (NSF) Metadata
Extract NSF-Specific Metadata Microsoft Personal Folders File (PST) Metadata
MAPI Properties Extract PST-Specific Metadata Exclude Metadata from the Extracted Text File Extract Subfiles from Outlook Files Extract Subfiles from Outlook Express Files
IDOL KeyView (12.12)

27 27 27 28 28 28 29 29 29 30 30 30
32
33
33 34 35 36 36 37 37 38 38 39 39 40 41 41 41 42 42 42 42 43 43 44 44
Page 4 of 280

Filter SDK Java Programming Guide
Extract Subfiles from Mailbox Files Extract Subfiles from Outlook Personal Folders Files
Choose the Reader to use for PST Files MAPI Attachment Methods Open Secured PST Files Detect PST Files While the Outlook Client is Running Extract Subfiles from Lotus Domino XML Language Files Extract .DXL Files to HTML Extract Subfiles from Lotus Notes Database Files System Requirements Installation and Configuration
Windows Linux AIX 5.x Open Secured NSF Files Format Note Subfiles Extract Subfiles from PDF Files Improve Performance for PDFs with Many Small Images Extract Embedded OLE Objects Extract Subfiles from ZIP Files Default File Names for Extracted Subfiles Default File Name for Mail Formats Default File Name for Embedded OLE Objects
Chapter 4: Use the Filter API
Generate an Error Log Enable or Disable Error Logging Change the Path and File Name of the Log File Report Memory Errors Specify a Memory Guard Report the File Name in Stream Mode Example Specify the Maximum Size of the Log File
Extract Metadata Extract Metadata for File Filtering Extract Metadata for Stream Filtering Example
Convert Character Sets Determine the Character Set of the Output Text Guidelines for Character Set Conversion
IDOL KeyView (12.12)

44 45 45 47 47 48 48 49 49 49 50 50 50 51 51 51 51 52 52 52 53 53 54
55
55 56 57 57 57 58 58 58 59 59 59 60 62 62 62
Page 5 of 280

Filter SDK Java Programming Guide

Set the Character Set During Filtering

63

Set the Character Set During Subfile Extraction

63

Prevent the Default Conversion of a Character Set

64

Extract Tracked Deleted Text

64

Filter PDF Files

64

Use the pdf2sr Reader

65

Filter PDF Files to a Logical Reading Order

65

Rotated Text

68

Extract Custom Metadata from PDF Files

68

Skip Embedded Fonts

69

Control Hyphenation

70

Filter Portfolio PDF Files

70

Table Detection for PDF Files

71

Filter Spreadsheet Files

71

Filter Worksheet Names

71

Filter Hidden Text in Microsoft Excel Files

72

Specify Date and Time Format on UNIX Systems

72

Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers

72

Extract Microsoft Excel Formulas

73

Standardize Cell Formats

73

Filter Presentation Files to a Logical Reading Order

75

Filter HTML Files

75

Filter XML Files

76

Configure Element Extraction for XML Documents

76

Configure Headers and Footers

80

Error Messages

81

Tab Delimited Output for Embedded Tables

84

Exclude Japanese Guide Text

84

Source Code Identification

84

Optical Character Recognition

85

Optimize OCR Performance

86

Configure the Proxy for RMS

87

Document Restrictions

88

Chapter 5: Sample Programs

89

Introduction

89

ExtractFilter

90

FilterFileByChunk

92

FilterFileToFile

93

FilterFileToStream

94

IDOL KeyView (12.12)

Page 6 of 280

Filter SDK Java Programming Guide
FilterStreamByChunk FilterStreamToFile FilterStreamToStream FilterTest
Part III: Appendixes
Appendix A: Supported Formats
Key to Supported Formats Table Supported Formats File Classes
Appendix B: Document Readers
Key to Document Readers Table Document Readers
Appendix C: Platform Differences
Feature Differences Reader Differences
Appendix D: Character Sets
Multibyte and Bidirectional Support Coded Character Sets
Appendix E: Extract and Format Lotus Notes Subfiles
Overview Customize XML Templates
Use Demo Templates Use Old Templates Disable XML Templates Template Elements and Attributes Conditional Elements Control Elements Data Elements Date and Time Formats Lotus Notes Date and Time Formats KeyView Date and Time Formats
IDOL KeyView (12.12)

95 96 97 98
102
103
103 105 171
173
173 175
203
204 205
207
207 215
221
221 221 222 222 222 223 223 224 225 228 228 229
Page 7 of 280

Filter SDK Java Programming Guide

Appendix F: File Format Detection
Introduction Extract Format Information Determine Format Support
Example formats.ini file entries Refine Detection of Text Files
Allow Consecutive NULL Bytes in a Text File Translate Format Information
Distinguish Between Formats Determine a Document Reader Additional Format Information
Appendix G: List of Required Files for Redistribution
Core Files Support Files Document Readers
Appendix H: Develop a Custom Reader
Introduction How to Write a Custom Reader
Naming Conventions Basic Steps Token Buffer Macros Reader Interface
Function Flow Example Development of fffFillBuffer()
Implementation 1--fpFillBuffer() Function Structure of Implementation 1 Problems with Implementation 1 Implementation 2--Processing a Large Token Stream Structure of Implementation 2 Problems with Implementation 2 Boundary Conditions Implementation 3--Interrupting Structured Access Layer Calls Structure of Implementation 3 Development Tips Functions xxxsrAutoDet() xxxAllocateContext()
IDOL KeyView (12.12)

234
234 234 234 235 235 236 237 238 238 238
239
239 240 241
249
249 250 250 251 251 253 253 254 254 254 255 255 256 256 257 257 258 260 260 261 261 262
Page 8 of 280

Filter SDK Java Programming Guide

xxxFreeContext() xxxInitDoc() xxxFillBuffer() xxxGetSummaryInfo() xxxOpenStream() xxxCloseStream() xxxCharSet()
Appendix I: Password Protected Files
Supported Password Protected File Types Open Password Protected Container Files Filter Password Protected Files
Appendix J: Microsoft Rights Management Service Protected Files
Microsoft Azure Rights Management Service RMS Credentials
Supported Formats Microsoft Office Files Implemented as pFile PDF Files Restricted Permission Messages
Appendix K: OCR Supported Languages
Send documentation feedback

263 263 264 265 266 267 267
269
269 270 270
272
272 273 273 273 274 276 277
278
280

IDOL KeyView (12.12)

Page 9 of 280

Filter SDK Java Programming Guide

IDOL KeyView (12.12)

Page 10 of 280

Part I: Overview of Filter SDK
This section provides an overview of the Micro Focus KeyView Filter SDK and describes how to use the Java implementation of the API.
l Introducing Filter SDK l Getting Started

IDOL KeyView (12.12)

Page 11 of 280

Chapter 1: Introducing Filter SDK

This section describes the Filter SDK package.

· Overview

12

· Features

12

· Platforms, Compilers, and Dependencies

13

· Windows Installation

15

· UNIX Installation

16

· Package Contents

16

· License Information

17

· Directory Structure

19

Overview
Micro Focus KeyView Filter SDK enables you to incorporate text extraction functionality into your own applications. It extracts text and metadata from a wide variety of file formats on numerous platforms, and can automatically recognize over 1000 document types. It supports both file-based and stream-based I/O operations, and provides in-process or out-of-process filtering.
Filter SDK is part of the KeyView suite of products. KeyView provides high-speed text extraction, conversion to web-ready HTML and well-formed XML, and high-fidelity document viewing.

Features
l Document readers are threadsafe. The benefit of a threadsafe technology is that you can successfully extract text from hundreds of documents simultaneously. Documents are not queued for sequential filtering, but are actually filtered at the same time.
l Filter supports popular word processing, spreadsheet, and presentation formats. Body text, endnotes, footnotes, and additional items such as document metadata are all included as part of the filtering process.
l Sample programs are provided to demonstrate the functionality of the APIs.
l You can extract files embedded within files, such as email attachments or embedded OLE objects, by using the File Extraction API.
l Filter allows for redirected input and output. You can provide an input stream that is not restricted to file system access.

IDOL KeyView (12.12)

Page 12 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK
l Filter automatically recognizes the file type being filtered and uses the appropriate filter. Your application does not need to rely on file name extensions to determine file types.
l You can filter documents to specific character encodings, such as Unicode or UTF-8. l You can write custom document readers for formats not directly supported by KeyView.
Platforms, Compilers, and Dependencies
This section lists the supported platforms, supported compilers, and software dependencies for the KeyView software.
Supported Platforms
The Java Filter SDK is supported on the following platforms. l CentOS 7 x86, x64, and AArch64 l IBM AIX L6.1 PowerPC 32-bit and 64-bit l IBM AIX L7.1 PowerPC 32-bit and 64-bit l macOS 10.13 or later on 64-bit Apple-Intel architecture l macOS 11 or later on Apple M1. l Microsoft Windows Server 2012 x64 l Microsoft Windows Server 2016 x64 l Microsoft Windows Server 2019 x64 l Microsoft Windows 8 x86 and x64 l Microsoft Windows 10 x64 l Oracle Solaris 10 SPARC l Oracle Solaris 10 x86 and x64 l Red Hat Enterprise Linux 7 x64 l Red Hat Enterprise Linux 8 x64 l SuSE Linux Enterprise Server 12 x64 l SuSE Linux Enterprise Server 15 x64

IDOL KeyView (12.12)

Page 13 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK

Supported Compilers
The following table lists the supported compilers for the Java Filter SDK.

Component

Compiler

Java components

Java 7 to 17

Software Dependencies
To run KeyView on Windows requires the Microsoft Visual C++ 2019 redistributables to be installed. The redistributables are provided in the vcredist folder of the KeyView SDK but you can download the latest installers from Microsoft to get the latest security, reliability, and performance improvements.
To run KeyView OCR and RMS decryption on 64-bit Linux requires libstdc++.so.6 and libgcc_ s.so.1 from GCC 5.4. For your convenience, these are provided in the redist folder of your KeyView installation.
NOTE: If you are running KeyView out-of-process then the kvoop executable must be able to link to libstdc++.so.6 and libgcc_s.so.1.
l If these are installed in a system folder, like /lib64, KeyView will find them automatically.
l If you prefer you can add the path of the folder containing these libraries to the environment variable LD_LIBRARY_PATH.
If you are running KeyView in-process:
l If your application is already linking to libgcc_s and libstdc++ from GCC 5.4 or later, KeyView will use them as well and no further action is needed.
l If your application is linking to earlier versions of libgcc_s and libstdc++, Micro Focus recommends that you upgrade those binaries to those from GCC 5.4 or later.
l If your application is not linking to libgcc_s and libstdc++ you must ensure those binaries are available in the same way as described in the instructions, above, for running KeyView out-of-process.
If older versions of libgcc_s and libstdc++ are provided (but at least those from GCC 4.8) then most features will continue to work, but Optical Character Recognition and RMS Decryption will not.
Some KeyView components require specific third-party software:
l Java Runtime Environment (JRE) or Java Development Kit (JDK) version 7 to 17 is required for the Filter and Export Java APIs and for graphics conversion in the Export SDK.
l Outlook 2002 or later is required to process Microsoft Outlook Personal Folders (PST) files using the MAPI-based reader (pstsr). The native PST readers (pstxsr and pstnsr) do not require Outlook.

IDOL KeyView (12.12)

Page 14 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK
NOTE: You must install an edition of Microsoft Outlook (32-bit or 64-bit) that matches the KeyView software. For example, if you use 32-bit KeyView, install 32-bit Outlook. If you use 64-bit KeyView, install 64-bit Outlook. If the editions do not match, KeyView returns Error 32: KVError_PSTAccessFailed and an error message from Microsoft Office Outlook is displayed: Either there is a no default mail client or the current mail client cannot fulfill the messaging request. Please run Microsoft Outlook and set it as the default mail client.
l Lotus Notes or Lotus Domino is required for Lotus Notes database (NSF) file processing. The minimum requirement is 6.5.1, but version 8.5 is recommended.
l The Microsoft .NET Framework is required if you are using the .NET implementation of the API.
Windows Installation
To install the SDK on Windows, use the following procedure.
To install the SDK 1. Run the installation program, KeyViewProductNameSDK_VersionNumber_OS.exe, where ProductName is the name of the product, VersionNumber is the product version number, and OS is the operating system. For example: KeyViewFilterSDK_12.12_Windows_X86_64.exe The installation wizard opens. 2. Read the instructions and click Next. The License Agreement page opens. 3. Read the agreement. If you agree to the terms, click I accept the agreement, and then click Next. The Installation Directory page opens. 4. Select the directory in which to install the SDK. To specify a directory other than the default, click , and then specify another directory. After choosing where to install the SDK, click Next. The Pre-Installation Summary opens. 5. Review the settings, and then click Next. The SDK is installed. 6. Click Finish.

IDOL KeyView (12.12)

Page 15 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK

UNIX Installation
To install the SDK, use one of the following procedures.
To install the SDK from the graphical interface l Run the installation program and follow the on-screen instructions.

To install the SDK from the console 1. Run the installation program from the console as follows: ./KeyViewFilterSDK_VersionNumber_Platform.exe --mode text where:

VersionNumber Platform

is the product version. is the name of the platform.

2. Read the welcome message and instructions and press Enter. The first page of the license agreement is displayed.
3. Read the license information, pressing Enter to continue through the text. After you finish reading the text, and if you accept the agreement, type Y and press Enter. You are asked to choose an installation folder.
4. Type an absolute path or press Enter to accept the default location. The Pre-Installation summary is displayed.
5. If you are satisfied with the information displayed in the summary, press Enter. The SDK is installed.

Package Contents

The Filter SDK installation contains: l All the libraries and executables necessary for extracting text from a wide variety of formats.
l The include files that define the functions and structures used by the application to establish an interface with Filter:

adapi.h adinfo.h kvcfsr.h

kvfilter.h kvioobj.h kvtoken.h

IDOL KeyView (12.12)

Page 16 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK

kvcharset.h kverrorcodes.h kvfilt.h kvfilt2.h

kvtypes.h kvxtract.h kwautdef.h

l The Java API implemented in the package com.verity.api.filter contained in the file KeyView.jar.
l The .NET API implemented in the namespace Autonomy.API.Filter in the library FilterDotNet.dll.
l The C++ SDK, which can be found in the cppapi folder.
l Sample programs that demonstrate File Extraction and Filter functionality using the APIs.
l The files necessary to create a custom document reader, and the source for a sample document reader for UTF-8. See Develop a Custom Reader, on page 249.

License Information
Your license key controls whether you have the full version of the KeyView SDK, or a trial version. It also determines whether the following advanced features are enabled:
l Advanced character set detection with the character set detection library (kvlangdetect). l Advanced document readers:
o Microsoft Outlook Personal Folders (PST) readers (pstsr, pstnsr, and pstxsr) o Lotus Notes database (NSF) reader (nsfsr) o Mailbox (MBX) reader (mbxsr) l Processing of documents protected by Microsoft RMS encryption. l Optical Character Recognition (OCR) to attempt to filter text that might be visible in raster image files.
If you obtain a new license key from Micro Focus, you must update the licensing information that you pass to KeyView. See Pass License Information to KeyView.
Enable Advanced Document Readers
To enable advanced readers, you must obtain an appropriate license key from Micro Focus and pass the license key to KeyView as described in Pass License Information to KeyView. If you are enabling the MBX reader in an existing installation of Filter, in addition to updating the license key, change the parameter 208=eml to 208=mbx in the formats.ini file.

IDOL KeyView (12.12)

Page 17 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK
Pass License Information to KeyView
To provide license information to KeyView, do one of the following: l Provide the license information through the API. Micro Focus recommends using this approach.
l Provide the license information as a text file named kv.lic. In earlier versions of KeyView, license information had to be stored in a file and included in the bin folder with the KeyView libraries. The ability to provide license information as a file has been deprecated and might be removed in future. You should no longer include license information in your application as a file. Micro Focus recommends that you pass license information to KeyView through the API instead.
If you have an evaluation version of KeyView and purchase a full version of the SDK, or you are adding a document reader (for example, the PST reader), you must update the license information that you pass to KeyView.
To provide license information through the API
l In the C API, provide license information when you initialize KeyView by calling fpInitWithLicenseData().
l In the C++ API, provide license information when you start a new session (see the constructor for the Session class).
l In the .NET API, provide license information to KeyView when you instantiate the Filter object.
l In the Java API, provide license information to KeyView when you instantiate the Filter object.
To provide license information as a file
1. Open or create the license key file, kv.lic, in a text editor. The file must be saved in the same directory as the KeyView libraries, and must contain your organization name and license key.
COMPANY NAME XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX 2. Replace the text COMPANY NAME with the company name that appears at the top of the License Key Sheet provided by Micro Focus. Enter the text exactly as it appears in the document.
3. Replace the characters XXXXXX-XXXXXXX-XXXXXXX-XXXXXXX with the appropriate license key from the License Key Sheet provided by Micro Focus. The license key is listed in the Key column in the Standalone Products table. The key is a string that contains 31 characters, for example, 2TQD22D-2M6FV66-2KPF23S-2GEM5AB. Enter the characters exactly as they appear in the document, including the dashes, but do not include a leading or trailing space.
4. The finished kv.lic file looks similar to the following:
Autonomy 24QD22D-2M6FV66-2KPF23S-2G8M59B 5. Save the file.

IDOL KeyView (12.12)

Page 18 of 280

Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK

Directory Structure

The following table describes the contents of the Filter SDK.
The variable OS is the operating system for which the SDK is installed. For example, the bin directory on a standard 32-bit Windows installation would be located at KeyviewFilterSDK\WINDOWS\bin.

Installed directory structure

Directory

Description

OS\bin

Contains the libraries, the format detection file formats.ini, and other supporting files, as well as the C programs filter and filtertest, which you can use to test your custom document readers (see Develop a Custom Reader, on page 249).

OS\lib

(Solaris installations only) Contains the redistributable libstlport.so.1 library, which is required to run KeyView on Solaris platforms.

dotnetapi

Contains the source files for the .NET API.

dotnetapi\dotnethelp Contains the help for the .NET API.

dotnetapi\sample

Contains the sample programs for the .NET API.

cppapi

Contains the source files for the C++ API.

cppapi\sample

Contains the sample programs for the C++ API.

guide

Contains the KeyView Filter SDK programming guides in PDF and HTML format.

include

Contains the header files required for Filter.

javaapi\javadoc

Contains the Javadoc for the Java API.

javaapi\sample

Contains the source files and sample programs for the Java API.

rel_notes

Contains the KeyView Filter SDK Release Notes in PDF format.

samples\filter

Contains the source code for the filter sample program demonstrating the Filter interface for the C API.

samples\pdfini

Contains the initialization file used to extract custom metadata from PDF documents.

samples\tstxtract

Contains a C sample program demonstrating the File Extraction interface.

samples\utf8sr

Contains the source for the sample document reader for UTF-8 files. You can use this to create your own custom document readers.

IDOL KeyView (12.12)

Page 19 of 280

Chapter 2: Getting Started

This section provides an overview of Filter SDK, and describes how to use the Java implementation of the API.

· Architectural Overview

20

· File Caching

21

· Filtering

22

· Subfile Extraction

22

· Use the Java Implementation of the API

22

· The Filter Process Model

25

· Run File Detection In or Out of Process

29

· Stream Data to Filter

30

Architectural Overview
The general architecture of the KeyView Filter technology is the same across all supported platforms and is illustrated in the following diagram.

Each component is described in the following table. IDOL KeyView (12.12)

Page 20 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started

Architectural Components

Component Description

Developer's Application

The developer's application interfaces directly with the Filter API through either a C-language, Java or .NET implementation.

File Extraction The File Extraction API opens a file and extracts the file's subfiles so they are

API

exposed for filtering. See Use the File Extraction API, on page 33.

Filter API

The Filter API exposes the filtering functionality and controls all other modules during the filtering process. See Use the Filter API, on page 55.

Format Detection

This module determines the file type of the input stream, allowing the Filter API to return that information to the developer's application, or to load the appropriate structured access layer for further processing. See File Format Detection, on page 234 for more information format detection.

Structured Access Layer

There are three modules that reside in the structured access layer--one each for word processing, spreadsheet, and presentation formats. The file detection result determines which structured access layer module is used during the filtering process. That module loads the appropriate document reader and proceeds with text extraction or metadata retrieval.

Document Readers

Each document reader reads a specific file format and sends a text stream of the document to the structured access layer. Each filter is loaded as required by the structured access layer. See Document Readers, on page 241 for a complete list of document readers.

File Caching
To reduce the frequency of I/O operations, and consequently improve performance, the KeyView readers load file data into memory. The readers then read the data from the cache rather than the physical disk. You can configure the amount of memory used for file caching through the formats.ini file. Generally, when you increase the memory, performance will improve.
By default, KeyView uses a maximum of 1 MB of memory for each thread. If the file data is larger than 1MB, up to 1MB of data is cached and the data beyond 1 MB is read from disk. The minimum amount of memory that can be used for file caching is 64 KB.
To determine a reasonable value, divide the maximum amount of memory you want KeyView to use for file caching by the total number of threads. For example, if you want KeyView to use a maximum of 50MB of memory and have 10 threads, set the value to 5 MB.
To modify the memory allocated for file caching, change the value for the following parameter in the [DiskCache] section of the formats.ini file:
DiskCacheSize=1024
The value is in kilobytes. If this parameter is not set or is set to 0 (zero), the minimum value of 64 KB is used.

IDOL KeyView (12.12)

Page 21 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started

The formats.ini file is in the directory install\OS\bin, where install is the pathname of the Filter installation directory and OS is the name of the operating system.
Filtering
Filter SDK enables you to filter many different types of documents. Filtering is the process of extracting the text from a document without the application-specific markup. However, the filtering process can also include the following:
l Subfile extraction--exposes all subfiles for filtering. See Use the File Extraction API, on page 33.
l File format extraction--detects a file's format, and reports the information to the API, which in turn reports the information to the developer's application. See File Format Detection, on page 234.
l Metadata extraction--extracts selected metadata (document properties) from a file. See Extract Metadata, on page 59.
l Character set conversion--controls the character set of both the input and the output text. See Convert Character Sets, on page 62.
Subfile Extraction
To filter a file, you must first determine whether the file contains any subfiles (attachments, embedded OLE objects, and so on). A file that contains subfiles is called a container file. Archive files (such as ZIP), mail messages with attachments (such as Microsoft Outlook Express), mail stores (such as Microsoft Outlook Personal Folders), and compound documents with embedded OLE objects (such as a Microsoft Word document with an embedded Excel chart) are examples of container files. If the file is a container file, the container must be opened and its subfiles extracted using the File Extraction interface. The extraction process is done repeatedly until all subfiles are extracted and exposed for filtering. Once a subfile is extracted, you can use the Filter API to filter the file. If a file is not a container, you should pass it directly to the Filter API for filtering without extraction. The ExtractFilter sample program demonstrates this logic for extracting and filtering files. See Use the File Extraction API, on page 33 for more information.
Use the Java Implementation of the API
The Java version of the Filter API provides an interface to the core functionality of the C API. It contains one primary class (Filter) that wraps the filter functionality of the C API. It is implemented in the package com.verity.api.filter contained in the file KeyView.jar. The JAR file is in the directory install\javaapi, where install is the path name of the Filter installation directory.

IDOL KeyView (12.12)

Page 22 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started
For more information on the Java API, see the Javadoc in the directory install\javaapi\javadoc, and Sample Programs, on page 89.
Input/Output Operations
Methods in the Filter Java API have signatures that support a variety of input and output methods. The input source can usually be a physical file accessed through a file path, a com.verity.api.SeekableInputStream or a standard java.io.InputStream. You can send the output to a file or java.io.OutputStream, or return it one chunk at a time in a byte array. You can set the input source by calling the setInputSource method. Alternatively, you can supply it as a parameter when you use the doFilter, canFilter, canFilterEx, getDocFormatInfo, or getSummaryInfo methods. KeyView needs to access different parts of files while it is filtering. When the input source is a stream, Micro Focus recommends passing a SeekableInputStream into KeyView, because it allows KeyView to only read the parts of the stream it needs to read. If you use a Java InputStream, KeyView must store the stream as it is received, writing to a temporary file if the stream is large. If you use a Java InputStream as the source, there are two available method signatures for functions. One method signature allows you to pass in the stream size. If you do not supply the stream size, KeyView reads the entire stream before processing starts. If you can provide the stream size, KeyView might not need to read the whole stream.
Filter in File or Stream Mode
To filter files using the methods in the Filter class 1. Instantiate a Filter object using either the default constructor or the constructor that sets the output character set and filter flags: a. Use the default constructor Filter(). For example:
m_objFilter = new Filter(); b. Use the constructor Filter(java.lang.String outputCharSet, long filterFlags).
For example:
m_objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_OOPLOGON); The Filter flags provide instructions on how to process a file or stream. For example, they specify whether an error log is generated during filtering (FILTERFLAG_OOPLOGON) or whether headers and footers are extracted from the document (FILTERFLAG_HEADERFOOTERTAGS).
NOTE: Filter runs out of process by default. See The Filter Process Model, on page 25 for more information.
2. Set the location of the Filter libraries by calling the setFilterDirectory(java.lang.String directory) method. These libraries are normally stored in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the

IDOL KeyView (12.12)

Page 23 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started

operating system. For example:

m_objFilter.setFilterDirectory(m_filterDirectory);

3. Set the input source as either a file or input stream by calling the setInputSource method.

m_objFilter.setInputSource(m_extractDir + filename);

4. Filter the file or stream by calling either the filterTo or doFilterChunk method. The filterTo method extracts the data to a file or a stream. The doFilterChunk method extracts one chunk of data from a file or a stream. It must be called repeatedly until the entire buffer is filtered.

If filtering in file mode, use the following code:

{ m_objFilter.filterTo(m_extractDir + filename + m_extension);
}

If filtering in stream mode, use the following code:

{
} }

outf = new File(m_extractDir + filename + m_extension); fos = new FileOutputStream(outf); m_objFilter.filterTo(fos); fos.close();

5. Terminate the filtering session and free allocated system resources by calling the shutdownFilter() method.

m_objFilter.shutdownFilter();

Multithreaded Filtering
To ensure multithreaded filter processes are thread-safe, you must create a unique Filter context for every thread by instantiating a Filter object. In addition, threads must not share context objects, and the same context object must be used for all API calls in the same thread. Creating a context object for every thread does not affect performance because the context object uses minimal resources. For example, your Java code should have the following logic in a thread:
m_objFilter = new Filter(); m_objFilter.setFilterDirectory(m_filterDirectory); m_objFilter.setInputSource(infile); m_objFilter.getDocFormatInfo();
if (objFilter.canFilter() == true)
m_objFilter.filterTo(outfile);
m_objFilter.shutdownFilter();

IDOL KeyView (12.12)

Page 24 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started
Before Running Your Application
Before running your application you must set the library path using one of the following methods: l On Windows, add the location of KeyViewFilter.dll to the PATH environment variable. l On Solaris, Linux, and HP-UX IA-64, add the location of libKeyViewFilter.so to the LD_ LIBRARY_PATH environment variable. l On HP-UX PA-RISC, add the location of libKeyViewFilter.sl to the SHLIB_PATH environment variable. l On AIX, add the location of libKeyViewFilter.a to the LIBPATH environment variable. l You can also specify the library path as a system property as follows: java -Djava.library.path=filter_bin_directory ...
The Filter Process Model
By default, Filter runs independently from the calling application process. This is called out-ofprocess filtering. Out-of-process filtering protects the stability of the calling application in the rare case when a malformed document causes Filter to fail. You can configure Filter to run in the same process as the calling application. This is called in-process filtering. However, it is strongly recommended you run Filter out of process whenever possible. The creation of child processes on UNIX usually adheres to Portable Operating System Interface (POSIX) standards. AIX uses different thread semantics. If required, a version of kvfilter with POSIX thread semantics is available for AIX. This file is kvfilter_nsl.a. It must be renamed to kvfilter.a to be used by Filter. To monitor and debug filtering operations during out-of-process filtering, you can generate an error log at run time. See Generate an Error Log, on page 55. The following methods run in process or out of process:
Filter API
canFilter canFilterEx doFilter doFilterChunk getSummaryInfo GetDocFormatInfo

IDOL KeyView (12.12)

Page 25 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started

File Extraction API

extCloseDocument extGetSubFileInfo extGetSubFileMetadata extExtractSubFile

extGetMainFileInfo extOpenDocument getSummaryInfo KVGetExtractInterface()

Other Filter API methods always run in process.

Persist the Child Process
By default, in out-of-process filtering, the parent process maintains a persistent connection with the child server after each file is filtered. When the connection is preserved in this way, subsequent filtering requests are processed more quickly because the server is already prepared to receive data. You can restart the server at regular intervals by using a method or a configuration setting.
In the API
To force KeyView to restart, call the refreshFilterKVOOP() method.
public void refreshFilterKVOOP();
In the formats.ini File
To control whether Filter persists the server, use the kvoopRefresh parameter in the [FilterSDK_ Config] section of the formats.ini file:
kvoopRefresh=0 When this is set to 0 (zero), the connection to the server is persisted for as long as the parent process is running or until the server fails. This is the default.
kvoopRefresh=n When this is set to n, the connection is persisted for n filter requests. After the nth request, the server is shutdown and restarted before processing the next request. For example, if kvooprefresh=5, the connection to the server is persisted for 5 filter requests. For the 6th request, the server is shutdown and restarted.
To control whether the parent process attempts to filter a file after the file has caused the server to fail, use the kvoopRetry parameter in the [FilterSDK_Config] section of the formats.ini file:
kvoopRetry=0 When this is set to 0 and the server fails, the parent process does not resend the file to a new server.
kvoopRetry=n When this is set to n (a positive number) and the server fails, the parent process resends the file to a new server n times. By default, the kvoopRetry is set to 1, and the file is resent to a server once.

IDOL KeyView (12.12)

Page 26 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started
The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system.
NOTE: The kvoopRefresh and kvoopRetry parameters do not apply when running the File Extraction functions out of process. See Run File Extraction Functions Out of Process, below.
Run Filter In Process
By default, Filter runs out of process. However, you can enable in-process filtering through the API or in the formats.ini file. If the type of process is not specified in the formats.ini or in the API, then Filter is run out of process. If the type of process is specified in the formats.ini and in the API, the setting in the API takes precedence.
In the API
To run Filter in process, instantiate the Filter object using the constructor Filter(java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to FILTERFLAG_ INPROCESS. objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_INPROCESS);
In the formats.ini File
To run Filter in process, set the following parameter in the [FilterSDK_Config] section of the formats.ini file to 1: default_inprocess=1 By default this is set to 0 (zero), which enables out-of-process filtering. The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system.
Run File Extraction Functions Out of Process
The out-of-process setting specified when you create the Filter object or in the formats.ini is automatically propagated to the File Extraction API. When you extract subfiles from container files and pass the files for filtering out of process, Filter generates a server called kvoop.exe for filtering and a duplicate server also called kvoop.exe for file extraction. These servers are independent, so if the filtering service stops responding, the file extraction service can continue extracting files uninterrupted.
Restart the File Extraction Server
If the file extraction server fails on a file and throws the exception KVError_ InvalidOopDriverSignature or KVError_InvalidOopServiceSignature, you must restart the server by recreating the Filter object, and process the source file again.

IDOL KeyView (12.12)

Page 27 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started
Out-of-Process Logging
Logging is available for out-of-process filtering. The kvoop server can now create a log file that captures information on the files being processed, storing one entry per process. The generated log file is called xxxx_kvoop.log, where xxxx is a unique number identifying the process. In the rare case when the kvoop server fails, you can use the log files to determine which file caused the failure. After processing is complete and the system shuts down, the logs are automatically deleted. To keep the log files after processing is successfully completed, see Keep Log Files, on the next page.
NOTE: Out-of-process logging is available only on certain platforms (see Out-of-process logging in the platform differences section).
Enable Out-of-Process Logging
To enable out-of-process logging, set the KVOOP_LOGS_DIR environment variable to the directory in which you want the log files to be stored. By default, logging is not enabled. On UNIX, the variable is set as follows: setenv KVOOP_LOGS_DIR /tmp On Windows, the variable is set as follows: set KVOOP_LOGS_DIR=c:\tmp The following log file is created in the directory: process_id_kvoop.log where process_id is a numeric value representing the logged process. New messages are appended to the file, and truncation is disabled by default. If KeyView terminates unexpectedly and Windows minidump is enabled, a process_id_crash_ info.txt file is generated (see Enable Windows Minidump, on the next page). If logging was not been enabled at the time of termination, this file contains instructions on how to enable logging.
Set the Verbosity Level
You can control how much information is written to the file by setting the KVOOP_LOG_VERBOSITY environment variable. For example: set KVOOP_LOG_VERBOSITY=1 The variable can be set to the following:
1 Include only error messages. 2 Include errors and warnings. 3 Include errors, warnings, and general information. This is the default. 4 Include all possible information. This setting is useful for debugging purposes.

IDOL KeyView (12.12)

Page 28 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started
Enable Windows Minidump
KeyView can use the Windows minidump feature to provide additional logging information, which can be useful for debugging purposes. The Windows minidump is disabled by default. To enable the Windows minidump, set KVOOP_DUMP_ ENABLE=1. If an unexpected termination occurs after the minidump is enabled, three files are generated:
l process_id_crash_info.txt. This file contains KVOOP state and runtime information at the time of termination. If logging was not enabled at the time of termination, this file contains instructions on how to enable logging.
l process_id_process_list.txt. This file contains information from the DLLs that were loaded at the time of the termination.
l process_id_report.dmp. The Windows dump file, which contains further information about the termination. You can open it with either a Windows debugger or autnhelper.exe (you must copy this file to the same directory).
You can control the amount of information presented in the Windows dump file by creating the following files in the directory:
dumper.NORMAL dumper.WITHDATASEGS dumper.WITHFULLMEMORY dumper.WITHHANDLEDATA
Keep Log Files
After processing is complete and the system is shut down, the log files are automatically deleted from the directory. To keep the log files after a successful run, set the KVOOP_KEEP_LOGS environment variable. On UNIX, set the variable as follows:
setenv KVOOP_KEEP_LOGS 1 On Windows, set the variable as follows:
set KVOOP_KEEP_LOGS=1
Run File Detection In or Out of Process
By default, detection runs in out-of-process mode. However, you can enable in-process detection through the API or in the formats.ini file. If the type of process is not specified in the formats.ini or in the API, Filter runs in out-of-process mode. If the type of process is specified in the formats.ini and in the API, the setting in the API takes precedence.

IDOL KeyView (12.12)

Page 29 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started

Specify the Process Type In the formats.ini File
Add the default_detect_inprocess flag to a [FilterSDK_Config] section in the formats.ini file to control the default behavior for detection. Set default_detect_inprocess to 0 for out-of-process detection, and 1 for in-process detection. For example:
[FilterSDK Config] default_detect_inprocess=0
If this flag is not specified, the file detection behavior is determined by the default_inprocess flag for filtering. For example, if you set default_inprocess to 1, filtering and file detection runs in inprocess mode by default; if you set default_inprocess to 0, filtering and file detection runs in out-ofprocess mode by default.
If both the default_inprocess and default_detect_inprocess flags are set, then default_ inprocess controls the default filtering behavior and default_detect_inprocess controls the default file detection behavior.
Specify the Process Type In the API
To run detection in in-process mode, instantiate the Filter object by using the constructor Filter (java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to FILTERFLAG_DETECTINPROCESS. To run detection in out-of-process mode, set FILTERFLAG_ DETECTOUTOFPROCESS.
objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_DETECTINPROCESS);

Stream Data to Filter

By default, when you run Filter out-of-process, and pass file streams to the API (instead of file names), Filter uses temporary files during communication.
When running out-of-process, you can configure KeyView to stream the file data while it processes it, rather than creating temporary files. This method is particularly beneficial if you do not want to process the whole file (for example, if you want to stop after filtering only some of the text, or extract only some of the subfiles).

NOTE: This option is disabled by default because for some files it might result in a longer processing time when you do need to process the whole file.

To turn on streaming mode, you can either:
l Set at least one of the following streaming parameters in the [FilterSDK_Config] section of the formats.ini to pipe:

streaming_ Set this parameter to pipe to change the overall behavior for filtering and

method

extraction to use streaming mode. By default this parameter is set to temp,

IDOL KeyView (12.12)

Page 30 of 280

Filter SDK Java Programming Guide Chapter 2: Getting Started

filter_ streaming_ method
extract_ streaming_ method

which uses temporary files during the filter process.
Set this parameter to pipe to configure filtering to use streaming mode. If you do not set this parameter, KeyView uses the value of streaming_method.
Set this parameter to pipe to configure extraction to use streaming mode. If you do not set this parameter, KeyView uses the value of streaming_ method.

l Set the filter streaming options in the API.
The streaming method has a number of advantages:
l It reduces the disk space used for temporary files.
l It improves the responsiveness for partial filtering. When using thetemp_file method, your first call to filterFileToStream or filterStreamToStream does not return until the entire file has been processed. When using the pipe method, these functions return the first block of text as soon as it is available.
l It reduces the I/O for partial filtering. When you use the pipe method, it might not be necessary for KeyView to read the whole input file, especially if you choose to stop filtering before all the text has returned.
l For many formats, it reduces the amount of the input file that is read during extraction, especially if you extract only a subset of the files.

IDOL KeyView (12.12)

Page 31 of 280

Part II: Use Filter SDK
This section explains how to perform some basic tasks by using the File Extraction and Filter APIs, and describes the sample programs.
l Use the File Extraction API l Use the Filter API l Sample Programs

IDOL KeyView (12.12)

Page 32 of 280

Chapter 3: Use the File Extraction API

This section describes how to extract subfiles from a container file using the File Extraction API.

· Introduction

33

· Extract Subfiles

34

· Extract Images

36

· Recreate a File Hierarchy

36

· Extract Mail Metadata

38

· Extract Subfiles from Outlook Files

44

· Extract Subfiles from Outlook Express Files

44

· Extract Subfiles from Mailbox Files

44

· Extract Subfiles from Outlook Personal Folders Files

45

· Extract Subfiles from Lotus Domino XML Language Files

48

· Extract Subfiles from Lotus Notes Database Files

49

· Extract Subfiles from PDF Files

51

· Extract Embedded OLE Objects

52

· Extract Subfiles from ZIP Files

52

· Default File Names for Extracted Subfiles

53

Introduction
To filter a file, you must first determine whether the file contains any subfiles (attachments, embedded OLE objects, and so on). A file that contains subfiles is called a container file. A container file has a main file (parent) and subfiles (children) embedded in the main file. The following are examples of container files:
l Archive files such as ZIP, TAR, and RAR. l Mail messages such as Outlook (MSG) and Outlook Express (EML). l Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox (MBX), and Lotus Notes
database (NSF). l PDF files that contain file attachments. l Compound documents with embedded OLE objects such as a Microsoft Word document with
an embedded Excel chart.
NOTE: Document Readers, on page 173 indicates which formats are treated as container files and are supported by the File Extraction API.

IDOL KeyView (12.12)

Page 33 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
The subfiles might also be container files, creating a file hierarchy of multiple levels. For example, an MSG file (the root parent) might contain three attachments:
l a Microsoft Word document that contains an embedded Microsoft Excel spreadsheet. l an AutoCAD drawing file (DWG). l an EML file with an attached Zip file, which in turn contains four archived files.

NOTE: The parent MSG file contains four first-level children. The body text of a message file, although not a standalone file in the container, is considered a child of the parent file.
Extract Subfiles
To filter all files in a container file, the container must be opened and its subfiles extracted to either a file or a stream using the File Extraction API. The extraction process is done repeatedly until all subfiles are extracted and exposed for filtering. Once a subfile is extracted, you can call Filter API methods to filter the data. If you require a container file, including subfiles, to be filtered to a single file, you must extract all files from the container, filter the files, and then append each filtered output to its parent.
To extract subfiles, follow this general procedure 1. Open the source file by calling the extOpenDocument method. This call defines the parameters necessary to open a file for extraction. 2. Determine whether the main file is a container file (contains subfiles) by calling the extGetMainFileInfo() method.

IDOL KeyView (12.12)

Page 34 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

3. If the call to extGetMainFileInfo() determined the source file is a container file, proceed to Step 4; otherwise, filter the file.
4. Determine whether the subfile is itself a container (contains subfiles) by calling the extGetSubFileInfo method.
5. Extract the subfile by calling the extExtractSubFile method.
6. If the call to extGetSubFileInfo determined the subfile is a container file, repeat Step 1 through Step 5 until all subfiles are extracted and the lowest level of subfiles is reached; otherwise, filter the file.

Sanitize Absolute Paths
When you extract a subfile from a container and write it to disk, you specify an extract directory and a path to extract the file to.
To set the path, you might use the path in the container file that you are extracting from, as returned from the Filter.extGetSubFileInfo() method. However, if the path is an absolute path, the file could be created outside the directory you have chosen as the extract directory. Your application might then contain a vulnerability that could be exploited to write files to unexpected locations in the file system. This section discusses some KeyView features that can help you secure your application by sanitizing paths.
KeyView always sanitizes relative paths that you pass in when extracting files, so that the paths remain within the extract directory you specify. For example, KeyView does not allow the use of ".." to move outside the extract directory.
KeyView can update absolute paths so that they remain within the extract directory. You can instruct KeyView to sanitize absolute paths programmatically (through the API), or by setting a parameter in the configuration file.
The following table shows the effect on some example paths.

Requested path Path of extracted file (not sanitized) Path of extracted file (sanitized)

file.txt dir/file.txt ../file.txt /dir/file.txt

extractDir/file.txt extractDir/dir/file.txt extractDir/file.txt /dir/file.txt

extractDir/file.txt extractDir/dir/file.txt extractDir/file.txt extractDir/dir/file.txt

To sanitize absolute paths
l Call the method setSanitizeAbsolutePaths on the ExtSubFileExtractConfig that you pass in to extExtractSubFile. When KeyView sanitizes a path and the resulting directory does not exist, extraction fails unless you instruct KeyView to create the directory, so you might also want to call the method setCreateDirectory. You can find the path that a file was actually extracted to from the ExtSubFileExtractInfo object that is returned from the extExtractSubFile method.

IDOL KeyView (12.12)

Page 35 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
To sanitize absolute paths (through configuration) l In the formats.ini configuration file, set the parameter SanitizeAbsoluteExtractPaths, for example: [Options] SanitizeAbsoluteExtractPaths=TRUE
Extract Images
You can use the File Extraction API to extract images within a file. If you use this feature, images within the file behave in the same way as any other subfile. Extracted images have the name image[X].[Y], where [X] is an integer, and [Y] is the extension. The format of the image is the same as the format in which it is stored in the document.
NOTE: Turning on ExtractImages can reduce the speed of the filtering operation.
To extract images l In the Java API, call the setExtractImages method on the filter object, for example: filter.setExtractImages(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] ExtractImages=TRUE
Recreate a File Hierarchy
When a container file is extracted, any relationships between the subfiles in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. The hierarchy can be used to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document's parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations. The information needed to recreate a file's hierarchy is provided in the call to extGetSubFileInfo. Call this method to retrieve an object of the ExtSubFileInfo class, then use the getParentIndex() and getChildArray() methods in this object to retrieve information about the subfile's parent and children. Since you can only retrieve the first-level children in a subfile, you must call extGetSubFileInfo repeatedly until information for the leaf-node children is extracted.

IDOL KeyView (12.12)

Page 36 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
Create a Root Node
Because of their structure, some container files do not contain a subfile or folder which acts as a root directory on which the hierarchy can be based. For example, subfiles in a Zip archive can be extracted, but none of the subfiles represent the root of the hierarchy. In this case, an artificial root node must be created at the top of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called root. Its index number is 0. To create a root node, call the setCreateNode method in the ExtOpenDocConfig object, and pass ExtOpenDocConfig to the extOpenDocument method. When a root node is created, the value returned from the getNumSubFiles method in the ExtMainFileInfo object includes the root node. For example, when you call extGetMainFileInfo on a Microsoft Word document with three embedded OLE objects and the root node is disabled, the number of subfiles is 3. If you create a root node, the number of subfiles is 4.
Example
For example, you might extract a PST file that contains seven subfiles with a root node enabled. The call to extGetMainFileInfo() returns the number of subfiles as 8 (seven subfiles and one root node). The following diagram shows the structure and the available hierarchy information after the subfiles are extracted:
Extracted PST File
The parentIndex specifies the index number of a subfile's parent. The childArray specifies an array of a subfile's children. With this information, you can recreate the hierarchy shown in the following diagram:

IDOL KeyView (12.12)

Page 37 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
Recreated File Hierarchy

Extract Mail Metadata
You can extract metadata such as subject, sender, and recipient from subfiles of mail formats by calling the extGetSubFileMetadata() method. You can extract a predefined set of metadata fields, or a list of metadata fields by their names or MAPI properties.

Default Metadata Set
KeyView internally defines a set of common mail metadata fields that can be extracted as a group from mail formats. This default metadata set is listed in the following table.

Default Mail Metadata List

Field Name (string to Description specify)

From Sent

The display name and email address of the sender. The time the message was sent.

To

The display names and email addresses of the recipients.

Cc

The display names and email addresses of recipients who receive

copies of the email.

Bcc

The display names and email addresses of recipients who received

blind copies of the email.

IDOL KeyView (12.12)

Page 38 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

Default Mail Metadata List, continued

Field Name (string to Description specify)

Subject Priority

The text in the subject line of the message. The priority applied to the message.

Because mail formats use different terms for the same fields, the format's reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned.
You can also extract the default field names individually by passing the field name (such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you will retrieve the contents of the Priority field from an MBX file, but will not retrieve the contents of the Importance field from an NSF file.
NOTE: You cannot pass the field names listed in MSG-Specific Metadata List, on the next page individually for PST files. However, you can pass either the MAPI tag number or one of the constants in the Filter class as integers. See Microsoft Personal Folders File (PST) Metadata, on page 42.

Extract the Default Metadata Set
To extract the default metadata set, call the extGetSubFileMetadata(long docContextID, int nSubFileIndex, ExtSubFileMetaConfig config) method. For example:
ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig(); ExtSubFileMetadata subfilemeta = null; subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaConfig);

Extract All Metadata
KeyView can extract all metadata from subfiles of MSG, EML, MBX, MIME, NSF, ICS, and DXL mail containers.
To extract all metadata, call the setAllMetadata() method of the ExtSubFileMetaConfig object, and pass ExtSubFileMetaConfig to the extGetSubFileMetadata method. For example:
config = new ExtSubFileMetaConfig(); config.setAllMetadata(true); subFileMetadata = export.extGetSubFileMetadata(extContextID, i, config);

IDOL KeyView (12.12)

Page 39 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

Microsoft Outlook (MSG) Metadata
In addition to the default metadata set, the metadata fields listed in the following table can be extracted for MSG files. The field name must be passed to metaNameArray in the call to the extGetSubFileMetadata() method.

MSG-Specific Metadata List

Field Name (string to specify)

Description

AttachFileName ConversationTopic

An attachment's long file name and extension, excluding path.
The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies. This is the first message's subject with any prefix removed.

CreationTime

The time the message or attachment was created. This value is displayed in the Sent field in the message's Properties dialog in Outlook.

InternetMessageID

The identifier for messages that come in over the Internet. This is the MAPI property PR_INTERNET_MESSAGE_ID. This property is not in the MAPI headers or MAPI documentation.

LastModificationTime Location

The time the message or attachment was last modified. This value is displayed in the Modified field in the message's Properties dialog in Outlook.
The physical location of the event specified in the Outlook calendar entry.

MessageID

The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message's Properties dialog in Outlook.

Received

The date and time a message was delivered. This value is displayed in the Received field in the message's Properties dialog in Outlook.

Sender

The name and email address of the message sender. This value is a concatenation of two MAPI properties in the following format:
"PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS>
The Sender value might be the same as or different than the default metadata From value (see Default Metadata Set, on page 38), depending on which MAPI properties exist in the MSG file.

Sensitivity

The value indicating the message sender's opinion of the sensitivity of a message, such as Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message's Properties dialog in Outlook.

TransportMsgHeaders

Contains transport-specific message envelope information. This value corresponds to the MAPI property PR_TRANSPORT_MESSAGE_HEADERS.

IDOL KeyView (12.12)

Page 40 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

MSG-Specific Metadata List, continued

Field Name (string to specify)

Description

StartDate EndDate

Contains an appointment start date. This value corresponds to the PR_ START_DATE MAPI property.
Contains an appointment end date. This value corresponds to the PR_ END_DATE MAPI property.

Extract MSG-Specific Metadata
To extract specific metadata fields from an MSG file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the field name defined in MSG-Specific Metadata List, on the previous page to metaNameArray (the string is not case sensitive).
For example, the following code extracts the contents of the ConversationTopic and MessageID fields:
ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
String[] metaNameArray = {"conversationtopic", "MessageID"};
subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata
In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field's name. If the name is a valid field in the file, the contents of the field are returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string "Received".
Extract EML- or MBX-Specific Metadata
To extract specific metadata fields from an EML or MBX file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the metadata name to metaNameArray (the string is not case sensitive). For example, the following code extracts the contents of the Received and Mime-version fields:
ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
String[] metaNameArray = {"Received", "Mime-version"};

IDOL KeyView (12.12)

Page 41 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Lotus Notes Database (NSF) Metadata
In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field's name. (You can extract fields from mail NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date a document in an NSF file was last accessed, you would pass the string "$LastAccessedDB".
NOTE: A complete list of NSF fields are provided in the Lotus Notes file stdnames.h. This header file is available in the Lotus API Toolkit.

Extract NSF-Specific Metadata
To extract specific metadata fields from an NSF file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the metadata name to metaNameArray (the string is not case sensitive).
For example, the following code extracts the contents of the Description and Categories fields:
ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
String[] metaNameArray = {"description", "Categories"};
subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Microsoft Personal Folders File (PST) Metadata
In addition to the default metadata set, you can extract Messaging Application Programming Interface (MAPI) properties from a PST file. These properties describe elements (subject, sender, recipient, and so on) of Outlook items within the PST file. Since the properties are stored in the PST file itself, they can be retrieved before the contents of the PST are extracted. This enables you to determine whether an Outlook item should be extracted based on a subfile's attributes. MAPI properties are also stored for Outlook attachments that are not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file).

MAPI Properties
Each MAPI property is identified by a property tag, which is a constant that contains the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components:

Property

PR_HASATTACH

IDOL KeyView (12.12)

Page 42 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

Identifier

0x0E1B

Property type PT_BOOLEAN (000B)

Property tag 0x0E1B000B

The Microsoft MAPI documentation on the Microsoft Developer Network website lists all available MAPI properties, their tags, and types.
You can retrieve any MAPI property that is of one of the MAPI property types listed below:

PT_I2

PT_DOUBLE PT_STRING8

PT_I4

PT_FLOAT PT_TSTRING

PT_BINARY PT_LONG PT_SYSTIME

PT_BOOLEAN PT_SHORT PT_UNICODE

NOTE: Properties with a PT_TSTRING type have the property type recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string (PT_STRING8) depending on the operating system's character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT is either 0x0037001E for an ANSI string, or 0x0037001F for a Unicode string.

Extract PST-Specific Metadata
In the call to extract subfile metadata, you can pass either the MAPI tag number (such as 0x0070001e) or one of the constants in the Filter class (such as KVPR_SUBJECT). These constants are a subset of MAPI properties and use a KeyView naming convention. For example, the property PR_ CONVERSATION_TOPIC is defined as KVPR_CONVERSATION_TOPIC. If the property you want to retrieve is not defined as a constant in the Filter class, you must pass the MAPI tag number.
To extract specific MAPI properties from a PST file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, int[] metaNameArray, ExtSubFileMetaConfig config) and pass the tag number or constant to metaNameArray.
For example, the following code extracts the MAPI properties PR_SUBJECT and PR_ALTERNATE_ RECIPIENT:
ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
int[] metaNameArray = {Filter.KVPR_SUBJECT, 0x3A010102};
subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Exclude Metadata from the Extracted Text File
When a mail message is extracted, the message text and header information (To, From, Sent, and so on) is also extracted. You can prevent the header information from appearing in the text file.

IDOL KeyView (12.12)

Page 43 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
To exclude the header information, call the setExcludeMailHeader() method of the ExtSubFileExtractConfig object, and pass ExtSubFileExtractConfig to the extExtractSubFile method. For example: m_excludeMailHeader = true; extconfig = new ExtSubFileExtractConfig(); extconfig.setExcludeMailHeader(m_excludeMailHeader); extinfo = m_objFilter.extExtractSubFile(extContextID, i, extconfig);
Extract Subfiles from Outlook Files
When you extract an Outlook file (MSG) to disk, the message text and header information (To, From, Sent, and so on) is extracted to a text file. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, on the previous page.) If the Outlook file contains a non-mail attachment, the attachment is extracted in its native format to a subdirectory. If the Outlook file contains a mail attachment, the attachment's message text and any attachments are extracted to a subdirectory.
Extract Subfiles from Outlook Express Files
When you extract an Outlook Express (EML) file to disk, the message text and header information (To, From, Sent, and so on) is extracted to a text file. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, on the previous page.) If the Outlook Express file contains a non-mail attachment, the attachment is extracted in its native format to the same directory as the message text file. If the Outlook Express file contains a mail attachment, the complete attachment (including message text and attachments), the message text file, and any non-mail attachments are extracted to the same directory as the main message.
NOTE: When the MBX reader (mbxsr) is enabled, it is used to filter MBX and EML files. If the MBX reader is not enabled, the EML reader (emlsr) is used.
Extract Subfiles from Mailbox Files
A Mailbox (MBX) file is a collection of individual emails compiled with RFC 822 and RFC 2045 - 2049 (MIME), and divided by message separators. There are many mail applications that export to an MBX format, such as Eudora Email and Mozilla Thunderbird. When an MBX file is extracted to disk, the message text and header information (To, From, Sent, and so on) from each mail file are extracted to text files. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, on the previous page.)

IDOL KeyView (12.12)

Page 44 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

In Eudora MBX files, attachments are inserted as a link and are stored externally from the message. These attachments are not extracted, but the path to the attachment is returned in the call to the extGetSubFileInfo method. You can write code to retrieve the attachment based on the returned path.
For MBX files from other clients, KeyView extracts attachments when they are embedded in the message.
NOTE: The Mailbox (MBX) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from Micro Focus.

Extract Subfiles from Outlook Personal Folders Files
KeyView can extract Outlook items such as messages, appointments, contacts, tasks, notes, and journal entries from a PST file. When a PST file is extracted to disk, the body text and header information (To, From, Sent, and so on) from each Outlook item is extracted to a text file. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, on page 43.)
You can also extract messages from PST files as MSG files, including all their attachments, using the setSaveAsMSG() method in the ExtSubFileExtractConfig class.
If an Outlook item contains a non-mail attachment, the attachment is extracted in its native format to a subdirectory. If an Outlook item contains an Outlook attachment, the attached item's body text and any attachments are extracted to a subdirectory.
NOTE: The Microsoft Outlook Personal Folders (PST) readers are an advanced feature and are sold and licensed separately. To enable these readers in a KeyView SDK, you must obtain an appropriate license key from Micro Focus. For information about adding a new license key to an existing installation, see Pass License Information to KeyView, on page 18.
Choose the Reader to use for PST Files
KeyView provides several ways of processing PST files:
l Indirectly, using the Microsoft Messaging Application Programming Interface (MAPI). MAPI is a Microsoft interface that enables different applications to exchange messages and attachments with each other. MAPI allows KeyView to open a PST file, traverse the folders, and extract items. The pstsr reader uses MAPI, but works only on Windows and requires that Microsoft Outlook is installed.
l Directly, without relying on the Microsoft interface to the PST format. Accessing the file directly does not require Microsoft Outlook. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section). The pstnsr reader is an alternative native reader, for the platforms not supported by pstxsr.

IDOL KeyView (12.12)

Page 45 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

On Windows, the MAPI-based reader is used by default but you can choose pstxsr if you prefer. On non-Windows platforms, only one of the native readers is available.
The differences between the readers are summarized in the following table.

Feature
Platforms supported
Outlook required MAPI properties supported Password protection supported
Compressible encryption supported High encryption supported

Native Reader (pstxsr)

Native Reader (pstnsr)

MAPI-based Reader (pstsr)

Windows x86 and x64 Linux x64 and AArch64

All platforms not supported by pstxsr

Windows x86 and x64

No

No

Yes

Yes. All properties defined in mapitags.h. Object properties are not supported.

Yes

Yes

Yes (using

KVCredential

structure)

Yes

Yes

Yes

No

No

Yes

To change the reader used to process PST files, change the PST entry (file category value 297) in the formats.ini file. For example, to use pstxsr:
297=pstx
NOTE: You must make sure that the PST that you are extracting is not open in the Outlook client, and that the Outlook process is not running.

NOTE: When extracting subfiles from PST files, information on the distribution list used in an email is extracted to a file called emailname.dist. This applies to the MAPI reader (pstsr) only.

System Requirements
MAPI is supported on Windows platforms only and relies on functionality in Outlook. If you want to use the MAPI-based reader, pstsr, Microsoft Outlook must be installed on the same machine as your application. Outlook must also be the default email application. KeyView supports the following PST formats and Outlook clients:
l Outlook 97 or later PST files
NOTE: The Outlook client must be the same version as, or newer than, the version of Outlook that generated the PST file.
l Outlook 2002 or later clients

IDOL KeyView (12.12)

Page 46 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
NOTE: You must install an edition of Microsoft Outlook (32-bit or 64-bit) that matches the KeyView software. For example, if you use 32-bit KeyView, install 32-bit Outlook. If you use 64-bit KeyView, install 64-bit Outlook. If the editions do not match, KeyView returns Error 32: KVError_PSTAccessFailed and an error message from Microsoft Office Outlook is displayed: Either there is a no default mail client or the current mail client cannot fulfill the messaging request. Please run Microsoft Outlook and set it as the default mail client.
MAPI Attachment Methods
The way in which you can access the contents of a PST message attachment is determined by the MAPI attachment method applied to the attachment. For example, if the attachment is an embedded OLE object, it uses the ATTACH_OLE attachment method. KeyView can access message attachments that use the following attachment methods: ATTACH_BY_VALUE ATTACH_EMBEDDED_MSG ATTACH_OLE ATTACH_BY_REFERENCE ATTACH_BY_REF_ONLY ATTACH_BY_REF_RESOLVE Attachments using the ATTACH_BY_VALUE, ATTACH_EMBEDDED_MSG, or ATTACH_OLE attachment methods are extracted automatically when the PST file is extracted. An "attach by reference" method means that the attachment is not in Outlook, but Outlook contains an absolute path to the attachment. Before you can extract these types of attachments, you must retrieve the path to access the attachment.
To extract "attach by reference" attachments 1. Determine whether the attachment uses an ATTACH_BY_REFERENCE, ATTACH_BY_REF_ONLY, or ATTACH_BY_REF_RESOLVE method by retrieving the MAPI property PR_ATTACH_METHOD. 2. If the attachment uses one of the "attach by reference" methods, get the fully qualified path to the attachment by retrieving the MAPI properties PR_ATTACH_LONG_PATHNAME or PR_ATTACH_ PATHNAME. 3. You can then either copy the files from their original location to the path where the PST file is extracted, or use the Filter API methods to filter the attachment.
Open Secured PST Files
KeyView enables you to specify credentials (user name and password), which are used to open a secured PST file for extraction. See Password Protected Files, on page 269 for more information.

IDOL KeyView (12.12)

Page 47 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
Detect PST Files While the Outlook Client is Running
If you are running an Outlook client while running the File Extraction API, the KeyView format detection module (kwad) might not be able to open the PST file to determine the file's format because Outlook has the file locked. In this case, you can do one of the following:
l Close Outlook when using the Extraction API l Detect PST files by extension only and bypass the format detection module. To enable this
option, add the following lines to the formats.ini file. [container_flags] detectPSTbyExtension=1
NOTE: The detectPSTbyExtension option only applies when you are using the MAPI reader (pstsr).
NOTE: If you use this option, you must make sure in your code that valid PST files are passed to KeyView because the format detection module will not be available to verify the file type and pass the file to the appropriate reader.
Extract Subfiles from Lotus Domino XML Language Files
When you extract a Lotus Domino XML Language (.DXL) file, the message text and header information (To, From, Sent, and so on) is extracted to a text file.
NOTE: To prevent header information from being extracted, see Exclude Metadata from the Extracted Text File, on page 43.
You can make sure that dates and times extracted from Lotus Domino .DXL files are displayed in a uniform format.
To extract custom date/time formats l In the formats.ini file, set the DateTimeFormat option in the [dxlsr] section. For example: [dxlsr] DateTimeFormat=%m/%d/%Y %I:%M:%S %p In this example, dates and times are extracted in the following format: 02/11/2003 11:36:09 AM The format arguments are the same as those for the strftime() function. See http://msdn.microsoft.com/en-us/library/fe06s4ak%28VS.71%29.aspx for more information.

IDOL KeyView (12.12)

Page 48 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
Extract .DXL Files to HTML
You can use the file extraction API to process .DXL files with an XSLT engine. The XSLT engine then transforms the extracted .DXL to .mail HTML files.
To extract .DXL files to HTML l Set the following options in the formats.ini file: [nsfsr] ExportDXL=1 ExportDXL_PureXML=1 [dxlsr] LNDParser=2
Extract Subfiles from Lotus Notes Database Files
A Lotus Notes database is a single file that contains multiple documents called notes. Notes include design notes (such as forms, views, folders, navigators, outlines, pages, framesets, agents, and resources), data document notes, profile document notes, access control list notes, and collection (index) notes. KeyView can extract text items, attachments, and OLE objects from data document notes only. Data document notes include emails, journal entries, discussion threads, documents (Microsoft Office and Lotus SmartSuite), and so on. All components of a note are prefixed by field names such as "SendTo:", "Subject:", and "Body:". When a note is extracted, the field names are not included in the extracted output; only the field values are extracted. When a mail message in an NSF file is extracted to disk, the body text and header information (such as the values from the SendTo, From, and DeliveredDate fields) in each message is extracted to a text file. (If you do not want the header information to appear in the message text file, see Exclude Metadata from the Extracted Text File, on page 43.)
NOTE: The Lotus Notes Database (NSF) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from Micro Focus.
System Requirements
The Lotus Notes Database (NSF) reader is available only on certain platforms (see nsfsr in the platform differences section). KeyView accesses NSF files indirectly by using the Lotus Notes API. Because the NSF reader relies on functionality in Lotus Notes, a Notes client or Domino server must be installed and configured on the same machine as KeyView. On UNIX and Linux, the Domino server is required. On Windows, the

IDOL KeyView (12.12)

Page 49 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
Notes client or Domino server is required. For information about the supported versions of Notes or Domino, see Software Dependencies, on page 14.
Installation and Configuration
Before KeyView can filter NSF files, you must set up the Notes client or Domino server. Full configuration is not required. The following steps outline the minimal setup for NSF filtering:
Windows
1. Install the Lotus Notes client or Lotus Domino server. You do not need to configure the client or server.
2. Make sure that the notes.ini file is in the proper location. l If Lotus Notes is installed, the file should appear in the install\lotus\notes directory, where install is the installation directory. l If only Lotus Domino is installed, the file should appear in the install\lotus\domino directory, where install is the installation directory.
If the file does not exist, create an ASCII file named notes.ini, and add the following text: [Notes] 3. Add the KeyView bin directory and the install\lotus\notes or install\lotus\domino directory to the PATH environment variable (the KeyView bin directory must be first in the path). Micro Focus recommends that you add the KeyView bin directory because the Lotus Notes or Domino server installation might contain older KeyView OEM libraries.
Linux
1. Install Lotus Domino server. You do not need to configure the server. 2. Make sure that the notes.ini file is in the install/lotus/notes/latest/linux directory,
where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text: [Notes] 3. Add the install/lotus/notes/latest/linux directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/linux:$PATH 4. Add the install/lotus/notes/latest/linux and the KeyView bin directory to the LD_ LIBRARY_PATH environment variable: setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/linux:$LD_LIBRARY_ PATH where keyview_bin is the location of the KeyView bin directory. Micro Focus recommends that you add the KeyView bin directory because the Lotus Notes installation might contain older KeyView OEM libraries.

IDOL KeyView (12.12)

Page 50 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
AIX 5.x
1. Install the bos.iocp.rte file set if it is not already installed, and reboot the machine. See the Lotus Domino server documentation for more information.
2. Install Lotus Domino server. You do not need to configure the server. 3. Make sure that the notes.ini file is in the install/lotus/notes/latest/ibmpow directory,
where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text: [Notes] 4. Add the install/lotus/notes/latest/ibmpow directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/ibmpow:$PATH 5. Add the install/lotus/notes/latest/ibmpow and the KeyView bin directory to the LIBPATH environment variable: setenv LIBPATH keyview_bin:install/lotus/notes/latest/ibmpow:$LIBPATH where keyview_bin is the location of the KeyView bin directory. Micro Focus recommends that you add the KeyView bin directory because the Lotus Notes installation might contain older KeyView OEM libraries.
Open Secured NSF Files
KeyView enables you to specify a user ID file and password to use to open a secured NSF file for extraction. See Password Protected Files, on page 269 for more information.
Format Note Subfiles
The KeyView NSF reader uses XML templates to format note subfiles. You can customize the templates to approximate the look and feel of the original notes as closely as possible. For more information, see Extract and Format Lotus Notes Subfiles, on page 221.
Extract Subfiles from PDF Files
KeyView can extract document-level and page-level attachments from a PDF document. Documentlevel attachments are added by using the Attach A File tool, and can include links to or from the parent document or to other file attachments. Page-level attachments are added as comments by using various tools. Page-level or comment attachments display the File Attachment icon or the Speaker icon on the page where they are located. KeyView can also extract the files from Portfolio PDFs. When a PDF file is extracted to disk, the PDF file is extracted to a directory and the PDF's attachments are saved in their native format to the same directory as the original PDF file.

IDOL KeyView (12.12)

Page 51 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API
Improve Performance for PDFs with Many Small Images
To improve performance when processing PDF files that contain many small images, you can choose to ignore images unless they exceed a minimum width and/or height. If an image is smaller than the minimum width or height, KeyView does not extract the image. For example, to ignore images that are less than 16 pixels wide or less than 16 pixels in height, add the following to the [pdf_flags] section of the formats.ini file: [pdf_flags] process_images_with_min_width=16 process_images_with_min_height=16
Extract Embedded OLE Objects
The File Extraction API can extract embedded OLE objects from the following types of documents: l Lotus Notes (DXL) l Microsoft Excel l Microsoft Word l Microsoft PowerPoint l Microsoft Outlook l Microsoft Visio l Microsoft Project l OASIS Open Document l Rich Text Format (RTF)
When an embedded OLE object is extracted from its parent file, the location of the embedded file in the original document is not available. The parent and child are extracted as separate files.
Extract Subfiles from ZIP Files
You can extract ZIP files that are not password-protected by using the general method (see Extract Subfiles, on page 34). However, some ZIP files use password protection, in which case you must use a different method to enter the required credentials. See Password Protected Files, on page 269 for more information.

IDOL KeyView (12.12)

Page 52 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

Default File Names for Extracted Subfiles
When a file name is not specified in the call to extExtractSubFile, in some cases, a default file name is applied to the extracted subfile.

Default File Name for Mail Formats

To avoid naming conflicts and problems with long file names, KeyView applies its own names to the extracted mail folders and mail items when a name is not supplied in the call to extExtractSubFile. A non-mail attachment retains its original file name and extension.
When the contents of a mail store or the message body of a mail message are extracted, the extracted file names might include the following:
l The first valid eight characters of the original folder name or "Subject" line of the mail message. If the "Subject" line is empty, the characters kvext are used, where ext is the format's extension. For example, the characters would be "kvmsg" for MSG, and "kvnsf" for NSF.
The following special characters are considered invalid and are ignored:

any non-printing character with a value less than 0x1F

angle brackets (< >)

double quotation mark (")

asterisk (*)

forward slash (/)

back slash (\)

pipe (|)

colon (:)

question mark (?)

For notes, the file name is derived from the first 24 characters of the note text. For contact entries, the file name is derived from the full name of the contact.
l The characters _kvn, where n is an integer incremented from 0 for each extracted item.
l One of the following extensions:

Type

File Extension

email message

.mail

calendar appointment .cal

contact entry

.cont

task entry

.task

note

.note

journal entry

.jrnl

distribution list

.dist

posting note

.post

IDOL KeyView (12.12)

Page 53 of 280

Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API

If the type cannot be determined for an MSG or PST file, the file is given a .mail extension. If the type cannot be determined for an NSF file, the file is given a .tmp extension.
For example, an MSG mail message with the subject line "RE: Product roadmap" that contains the Microsoft Excel attachment release_schedule.xls is extracted as
RE produ_kv0.mail
release_schedule.xls If an extracted message contains an embedded OLE object or any attachment that does not have a name, the object or attachment is extracted as _kv#.tmp.

Default File Name for Embedded OLE Objects

KeyView can apply a default name to an extracted embedded OLE object when a name is not supplied in the call to extExtractSubFile. When an embedded OLE object is extracted, the extracted file name might include the following:
l The first valid eight characters of the main file. The following special characters are considered invalid and are ignored:

any non-printing character with a value less than 0x1F

angle brackets (< >)

double quotation mark (")

asterisk (*)

forward slash (/)

back slash (\)

pipe (|)

colon (:)

question mark (?)

l The characters _kvn, where n is an integer incremented from 0 for each extracted object.
l If KeyView can determine the embedded OLE is a Microsoft Office document, the original extension is used. If the file type cannot be determined, the file is given a .tmp extension.
For example, let us say a Microsoft Word document (sales_quarterly.doc) contains two embedded OLE objects: a Microsoft Excel file called west_region.xls, and a bitmap created in the Word document. The embedded objects would be extracted as
sales_qu_kv0.xls
sales_qu_kv1.tmp

IDOL KeyView (12.12)

Page 54 of 280

Chapter 4: Use the Filter API

This section describes how to perform some basic filtering tasks by using the Filter API.

· Generate an Error Log

55

· Extract Metadata

59

· Convert Character Sets

62

· Extract Tracked Deleted Text

64

· Filter PDF Files

64

· Filter Spreadsheet Files

71

· Filter Presentation Files to a Logical Reading Order

75

· Filter HTML Files

75

· Filter XML Files

76

· Configure Headers and Footers

80

· Error Messages

81

· Tab Delimited Output for Embedded Tables

84

· Exclude Japanese Guide Text

84

· Source Code Identification

84

· Optical Character Recognition

85

· Configure the Proxy for RMS

87

· Document Restrictions

88

Generate an Error Log
You can monitor and debug filtering operations by enabling a detailed error log. This allows you to see errors that are generated at run time and to track problem files in stream or file mode.
NOTE: Error logs are not generated when in-process filtering is enabled.
The error log might include the following information: l Generated error messages. l Time stamp. l Path and file name of the file in which the error occurred. l Length of the file in which the error occurred. If the name of the original file or the name of the temporary file are not obtained in stream mode, the file length is reported.
The following is a sample log file:

IDOL KeyView (12.12)

Page 55 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
-KVOOPE 12 # Time: 11:14:32 # File Len = 68140 -KVOOPE 13 # Time: 11:23:05 # H:\files\WP\Word97\fnldmsa.doc -KVOOPE 5 # Time: 12:15:54 # H:\files\SS\XL2000\corporate.xsl -KVOOPE 5 # Time: 12:45:19 # H:\files\WP\WPerf5\wp501.doc -KVOOPE 12 # Time: 14:25:33 # H:\files\PG\PPoint95\95.ppt -KVOOPE 26 # Time: 16:26:04 # File Len = 19117568 -KVOOPE 10 # Time: 20:27:40 # File Len = 19117568 You can specify the information that is written to the log file using either the API or environment variables. To configure a log file for a single filtering session, use environment variables. To configure a log file for all filtering sessions, use the API. Configuring the log file using the API overrides the same settings in the environment variables. You can also specify additional settings in the formats.ini file You can configure the following features of the log file:
l Enable or disable logging. See Enable or Disable Error Logging, below. l Change the default path and file name of the log file. See Change the Path and File Name of the
Log File, on the next page. l Include memory errors in the log file. See Report Memory Errors, on the next page. l Specify a memory guard that is used to generate memory overwrite errors in the log. See
Specify a Memory Guard, on the next page. l Include the input file name in the log file when filtering a stream. See Report the File Name in
Stream Mode, on page 58. l Specify the maximum size of the log file. See Specify the Maximum Size of the Log File, on
page 58.
Enable or Disable Error Logging
You can enable or disable error logging using either the API or environment variables. By default, a file called kvoop.log is created in the system temporary directory; however, you can change the path and file name of this file (see Change the Path and File Name of the Log File, on the next page).
Use the API
To enable or disable logging in the API, instantiate the Filter object using the constructor Filter (java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to either FILTERFLAG_OOPLOGON or FILTERFLAG_OOPLOGOFF. For example:
objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_OOPLOGON);
Use Environment Variables
To enable logging, add the environment variable KVOOPLOGON, and set the variable value to 1. To disable logging, do not set the environment variable KVOOPLOGON.

IDOL KeyView (12.12)

Page 56 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
Change the Path and File Name of the Log File
You can change the default path and file name of the log file. The default is C:\temp\kvoop.log on Windows and /tmp/kvoop.log on UNIX. To change the path and file name of the log file, add the following to the formats.ini file: [kvooplog] KvoopLogName=filepath The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system.
Report Memory Errors
You can report memory leaks and memory overwrites in the log file by enabling the memory trace system, either by using the API or environment variables. If the memory trace system is enabled, the error messages for memory leaks and memory overwrites (KVError_MemoryLeak and KVError_ MemoryOverwrite, respectively) are reported in the log file when they are generated. The error messages are listed in Error Messages, on page 81.
NOTE: To report memory overwrites, you must also set a memory guard. See Specify a Memory Guard, below.
Use the API
To enable or disable the memory trace system in the API, instantiate the Filter object using the constructor Filter(java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to either FILTERFLAG_OOPMEMTRACEON or FILTERFLAG_OOPMEMTRACEOFF. For example: objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_OOPMEMTRACE);
Use Environment Variables
To enable the memory trace system, add the KVOOPMT environment variable, and set its value to 1. To disable the memory trace system, do not set the KVOOPMT environment variable .
Specify a Memory Guard
To report memory overwrites in the log file, you must set a memory guard that protects against memory overwrites. Normally, this is set in the range of 100-200 bytes. For example, if a memory guard of 100 is set and 20 bytes of memory are specified, a total of 120 bytes of memory are allocated. The additional memory is used to monitor and identify memory overwrites. To configure the memory guard, add the following section to the formats.ini file: [Kvooplog] mg=100

IDOL KeyView (12.12)

Page 57 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Report the File Name in Stream Mode
When you run Filter in file mode, the file name is always reported in the log file. To report the file name in stream mode, you must extract it through the API.
To add the input file name to the log 1. Create an instance of ConfigOption with the following arguments: a. Set the OptionType to CFG_SETOOPSRCFILE. b. Set the OptionValue to 0. c. Set OptionData to the input_filename. 2. Call the setConfigOption method, and pass in the ConfigOption instance.
Example
if((filterFlags & Filter.FILTERFLAG_OOPLOGON) == Filter.FILTERFLAG_OOPLOGON) {
ConfigOption config = new ConfigOption(Filter.CFG_SETOOPSRCFILE, 0, inFile); objFilter.setConfigOption(config); }

Specify the Maximum Size of the Log File
You can specify the maximum size of the log file. When this size is reached and new entries are logged, either the first entry in the file is overwritten or the new entries are not reported.
To configure the maximum log size and whether old entries are overwritten, add the following section to the formats.ini file:
[Kvooplog] LogFileSize=10 OverWriteLog=1

Option LogFileSize
OverWriteLog

Description
This option specifies the maximum size of the log file in KB. The minimum is 1 K. If a size is not specified, the default 2 MB is used.
This option determines whether the log file is overwritten when the maximum log file size (LogFileSize) is reached. If you set this option to 1, the first entry in the log file is overwritten. If you set this option to 0, new entries are not reported in the log file.

IDOL KeyView (12.12)

Page 58 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
Extract Metadata
When a file format supports metadata, KeyView can extract and process that information. Metadata includes document information fields such as title, author, creation date, and file size. Depending on the file's format, metadata is referred to in a number of ways: for example, "summary information," "OLE summary information," "file information," and "document properties." The metadata in mail formats (MSG and EML) and mail stores (PST, NSF, and MBX) is extracted differently than other formats. For information on extracting metadata from these formats, see Extract Mail Metadata, on page 38.
NOTE: KeyView can only extract metadata from a document if metadata is defined in the document, and if the document reader can extract metadata for the file format. The section Document Readers, on page 173 lists the file formats for which metadata can be extracted. KeyView does not generate metadata automatically from the document contents.
The sample program FilterTest demonstrates how to extract metadata. See Sample Programs, on page 89.
Extract Metadata for File Filtering
To extract metadata for file filtering 1. Optionally, set the input source using the setInputSource(java.lang.String inFile) method of the Filter object. 2. If the input source was set in step 1, call the getSummaryInfo() method of the Filter object to retrieve an object of the SummaryInfo class. Otherwise, call the getSummaryInfo (java.lang.String inFile) method. 3. Use the methods of the SummaryInfo object to retrieve the metadata information.
Extract Metadata for Stream Filtering
To extract metadata for stream filtering 1. Optionally, set the input source using one of the following methods of the Filter object: l setInputSource(com.verity.api.SeekableInputStream input) l setInputSource(java.io.InputStream input, long size) l setInputSource(java.io.InputStream input) 2. If you set the input source in step 1, call the getSummaryInfo() method of the Filter object to retrieve an object of the SummaryInfo class. Otherwise, call one of the following methods: l getSummaryInfo(com.verity.api.SeekableInputStream input) l getSummaryInfo(java.io.InputStream input, long size)

IDOL KeyView (12.12)

Page 59 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
l getSummaryInfo(java.io.InputStream input)
3. Use the methods of the SummaryInfo object to retrieve the metadata information.
TIP: Micro Focus recommends that you provide a SeekableInputStream. See Input/Output Operations, on page 23.
Example
Below is an example of a call to getSummaryInfo():
SummaryInfo[] sinfo = objFilter.getSummaryInfo(); if(sinfo != null) {
System.out.println("\nSummary info has been extracted."); fos_sum = new FileOutputStream(summaryOutFile); DataOutputStream dos_sum = new DataOutputStream(fos_sum); for(int i=0; i<sinfo.length; i++) {
if(sinfo[i].getElementName() != null) {
dos_sum.writeBytes("Element name: " + sinfo[i].getElementName() + "\n"); dos_sum.writeBytes("Element type: " + sinfo[i].getSumInfoType() + "\n"); if(sinfo[i].getIsValid() == true) {
if(sinfo[i].isDateTimeType()) {
dos_sum.writeBytes("Date/time: "); dos_sum.writeBytes(sinfo[i].getDateTime()); } else { byte[] data = sinfo[i].getData(); if(data != null) {
dos_sum.writeBytes("Element data: "); dos_sum.write(data); } } } dos_sum.writeBytes("\n\n"); } } dos_sum.close(); fos_sum.close; } sinfo=null;

IDOL KeyView (12.12)

Page 60 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

The SummaryInfo class stores the metadata extraction results. After calling the Filter.getSummaryInfo()method, call the get methods provided by each instance of this class to extract metadata:

getElementName () getSumInfoType ()
getIsValid() isDateTimeType () getDateTime() getData()

Gets the name of the metadata element.
Specifies the data type of the metadata element. The possible types are: l KV_String--The value in the metadata field is a string. l KV_Int4--The value in the metadata field is an integer. l KV_DateTime--The value in the metadata field is a date and time. This type is a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (Windows FILETIME EPOCH). You might need to convert this value into another format.
You can also use the isDateTimeType() method to determine whether a metadata element is of date/time type, and then use the getDateTime() method to obtain the date/time in the form of a string.
l KV_ClipBoard--Currently not supported. l KV_Bool--The value in the metadata field is a boolean. l KV_Unicode--The value in the metadata field is a Unicode string. l KV_IEEE8--The value in the metadata field is an IEEE 8-byte integer. l KV_Other--The value in the metadata field is user-defined.
Specifies whether the data value is present in the document. true specifies that the value is valid. For example, if the "Title" element was not populated in the document, getIsValid would return false.
Determines whether the metadata element is of date/time type.
Gets the date and time in the form of a string. If the metadata element is of KV_ DateTime type, call this method to get the date and time in the form of a string, for example "Wed Jun 30 21:49:08 1993" or "135 Minutes".
Gets the content of the element. If type is KV_Int4 or KV_Bool, data contains the actual value. Otherwise, data is a pointer to the actual value. KV_DateTime and KV_IEEE8 point to an 8-byte value. KV_String and KV_Unicode point to the beginning of the string that contains the text. KV_Unicode is replaced with KV_String when the UNICODE value has been character mapped to the desired output character set.

IDOL KeyView (12.12)

Page 61 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Convert Character Sets
Filter can convert the character set of a source document to an arbitrary character set specified in the API, or to the character set of the operating system on which the output text is viewed. For this conversion to occur, a source character set must be identified. The source character set can either be determined by the document reader, or can be set in the API. The section Document Readers, on page 173 lists file formats for which character set information can be determined by the document reader. The character sets are defined as constants in the Filter class.

Determine the Character Set of the Output Text
To determine the output character set of a filtered document, Filter considers the following: l Whether the document reader can determine the character set of the file format. If the document reader cannot determine the character set information for the document type, set the source character set in the API.
l Whether the source character set is specified in the API.
l Whether the target character set is specified in the API.

Guidelines for Character Set Conversion
Below are some rules for the determination of character set mapping:
l If the source is not determined by the document reader or configured in the API, then the character set of the output text is always unknown, regardless of the target character set configuration. The document cannot be converted to a target character set or the operating system's code page unless the source character set is known.
l If the target character set is not specified in the API, and the source character set is identified, then the operating system's code page is used for the output text.
l If the source character set is identified, and the target character set is specified in the API, then the target character set specified in the API is used for the output text.
l For documents that contain multiple character sets, Micro Focus recommends that the target character set be forced to UNICODE or UTF-8.
The following table illustrates how Filter determines the character set of the output text.

Determining the Output Character Set--Example

Source charset read by Filter

Source charset specified in API

No

No

No

KVCS_936

Target charset specified Output

in API

charset

No

no

conversion

No

OS code

page

IDOL KeyView (12.12)

Page 62 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Determining the Output Character Set--Example, continued

Source charset read by Filter

Source charset specified in API

Target charset specified Output

in API

charset

No

No

UNICODE

no conversion

No

KVCS_936

UNICODE

UNICODE

Yes

No

No

OS code

page

Yes

KVCS_936

No

OS code page

Yes

No

UNICODE

UNICODE

Yes

KVCS_936

UNICODE

UNICODE

Set the Character Set During Filtering
You can convert the character set of a file at the time the file is filtered. To specify the source character set, use the setSourceCharSet(java.lang.String charset) method. For example:
objFilter.setSourceCharSet(sourceCharSet); To specify the target character set, instantiate the Filter object using the constructor Filter (java.lang.String outputCharSet, long filterFlags). For example: objFilter = new Filter(outputCharSet, filterFlags);
Set the Character Set During Subfile Extraction
You can convert the character set of a subfile at the time the subfile is extracted from the container and before it is filtered. This is most often used to set the character set of a mail message's body text. See Filter PDF Files, on the next page for more information.
To specify the source and target character set of a subfile 1. Use the methods of the ExtSubFileExtractConfig object to set the source and target character set. 2. Call the extExtractSubFile method of the Filter object and pass in the ExtSubFileExtractConfig object. For example: extconfig = new ExtSubFileExtractConfig(); extconfig.setSourceCharset(m_sourceCharSet);

IDOL KeyView (12.12)

Page 63 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
extconfig.setTargetCharset(m_outputCharSet); extinfo = m_objFilter.extExtractSubFile(extContextID, i, extconfig);
Prevent the Default Conversion of a Character Set
You can prevent the default conversion of text to the operating system code page, and specify that Filter retain the original character encoding of the document when it is available. Any document identified as containing more than one character encoding is converted to the first encoding encountered in the file. To prevent the default conversion, instantiate the Filter object using the constructor Filter (java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to FILTERFLAG_NODEFAULTCHARSETCONVERT. For example: objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_NODEFAULTCHARSETCONVERT); This setting overrides the source or target character set specified in the API.
Extract Tracked Deleted Text
The revision tracking feature in applications--such as Microsoft Word's Track Changes--marks changes to a document (typically, strikethrough for deleted text and underline for inserted text) and tracks each change by reviewer name and date. If revision tracking was enabled when text was deleted from a source document, you can configure Filter to extract the deleted text. Filter does not extract the reviewer name and revision date. Deleted text is excluded from the filtered output by default. To extract deleted text from a document and include it in the filtered output, call the includeRevisionMark method. For example: if(inclRevisionMark == true) {
objFilter.includeRevisionMark(); } To reset the flag and exclude deleted text from the filtered output, call the excludeRevisionMark method. For example: if(inclRevisionMark == false) {
objFilter.excludeRevisionMark(); }
Filter PDF Files
Filter has special configuration options that allow greater control over the conversion of Adobe Acrobat PDF files.

IDOL KeyView (12.12)

Page 64 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
Use the pdf2sr Reader
The pdf2sr reader is an alternative that can be used instead of pdfsr for filtering PDF files. It uses a different parsing technology and may yield better results for some files. The pdf2sr reader has the following features:
l supports standard and custom metadata (non-XMP) l supports basic text extraction l supports password protected PDFs l supports table detection (see Table Detection for PDF Files, on page 71) The pdf2sr reader has the following limitations: l does not support logical order l does not support bidi PDFs l does not extract subfiles l does not extract bookmarks from PDFs l does not give estimations on percent embedded fonts match with display glyphs l does not support XMP metadata l does not support headers or footers l supports annotations only in the raster output, not as searchable text l does not support content access stream l does not support tagged content (PDFs) l does not filter text from XFA-based PDF forms l does not report document restrictions (see Document Restrictions, on page 88)
To use the pdf2sr reader 1. Open the formats.ini file with a text editor. 2. In the [Formats] section, set the following: 200=pdf2
Filter PDF Files to a Logical Reading Order
The PDF format is primarily designed for presentation and printing of brochures, magazines, forms, reports, and other materials with complex visual designs. Most PDF files do not contain the logical structure of the original document--the correct reading order, for example, and the presence and meaning of significant elements such as headers, footers, columns, tables, and so on. KeyView can filter a PDF file either by using the file's internal unstructured paragraph flow, or by applying a structure to the paragraphs to reproduce the logical reading order of the visual page.

IDOL KeyView (12.12)

Page 65 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

PDF files that contain languages that read from right-to-left (such as Hebrew and Arabic) in the correct reading direction.

NOTE: The algorithm used to reproduce the reading order of a PDF page is based on common page layouts. The paragraph flow generated for PDFs with unique or complex page designs might not emulate the original reading order exactly.
For example, page design elements such as drop caps, callouts that cross column boundaries, and significant changes in font size might disrupt the logical flow of the output text.

By default, KeyView produces an unstructured text stream for PDF files. This means that PDF paragraphs are extracted in the order in which they are stored in the file, not the order in which they appear on the visual page. For example, a three-column article could be output with the headers and title at the end of the output file, and the second column extracted before the first column. Although this output does not represent a logical reading order, it accurately reflects the internal structure of the PDF.
You can configure KeyView to produce a structured text stream that flows in a specified direction. This means that PDF paragraphs are extracted in the order (logical reading order) and direction (leftto-right or right-to-left) in which they appear on the page.
The following paragraph direction options are available:

Paragraph Direction Option Left-to-right
Right-to-left
Dynamic

Description
Paragraphs flow logically and read from left to right. You should specify this option when most of your documents are in a language that uses a left-to-right reading order, such as English or German.
Paragraphs flow logically and read from right to left. You should specify this option when most of your documents are in a language that uses a right-to-left reading order, such as Hebrew or Arabic.
Paragraphs flow logically. The PDF filter determines the paragraph direction for each PDF page, and then sets the direction accordingly. Filter uses this option when a paragraph direction is not specified.

NOTE: Filtering might be slower when logical reading order is enabled. For optimal speed, use an unstructured paragraph flow.
The paragraph direction options control the direction of paragraphs on a page; they do not control the text direction in a paragraph. For example, a PDF file might contain English paragraphs in three columns that read from left to right, but 80% of the second paragraph might contain Hebrew characters. If the left-to-right logical reading order is enabled, the paragraphs are ordered logically in the output--title paragraph, then paragraph 1, 2, 3, and so on--and flow from the top left of the first column to the bottom right of the third column. However, the text direction of the second paragraph is determined independently of the page by the PDF filter, and is output from right to left.

IDOL KeyView (12.12)

Page 66 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

NOTE: Extraction of metadata is not affected by the paragraph direction setting. The characters and words in metadata fields are extracted in the correct reading direction regardless of whether logical reading order is enabled.

Enable Logical Reading Order
You can enable logical reading order by using either the API or the formats.ini file. Setting the paragraph direction in the API overrides the setting in the formats.ini file.

Use the Java API
To enable PDF logical reading order in the API, use the setPDFLogicalOrder(int orderFlag) method, and set the orderFlag argument to one of the following flags:

Flag

Description

PDF_ LOGICAL_ ORDER_LTR

Logical reading order and left-to-right paragraph direction

PDF_ LOGICAL_ ORDER_RTL

Logical reading order and right-to-left paragraph direction

PDF_ LOGICAL_ ORDER_ AUTO

Logical reading order. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. Filter uses this option when a paragraph direction is not specified.

PDF_ LOGICAL_ ORDER_RAW

Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag.

For example: objFilter.setPDFLogicalOrder(Filter.PDF_LOGICAL_ORDER_RTL); The FilterTest sample program demonstrates this method. See FilterTest, on page 98.

Use the formats.ini File
The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system.
To enable logical reading order by using the formats.ini file 1. Change the PDF reader entry in the [Formats] section of the formats.ini file as follows: [Formats] 200=lpdf 2. Optionally, add the following section to the end of the formats.ini file:

IDOL KeyView (12.12)

Page 67 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

[pdf_flags] pdf_direction=paragraph_direction
where paragraph_direction is one of the following:

Flag
LPDF_ LTR
LPDF_ RTL
LPDF_ AUTO
LPDF_ RAW

Description Left-to-right paragraph direction
Right-to-left paragraph direction
The PDF filter determines the paragraph direction for each PDF page, and then sets the direction accordingly. Filter uses this option when a paragraph direction is not specified. Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag.

Rotated Text
When a PDF that contains rotated text is filtered, the rotated text is extracted after the text at the end of the PDF page on which the rotated text appears. If the PDF is filtered with logical order enabled, and the amount of rotated text on a page surpasses a predefined threshold, the page is automatically output as an unstructured text stream. You cannot configure this threshold.

Extract Custom Metadata from PDF Files
To extract custom metadata from your PDF files, add the custom metadata names to the pdfsr.ini file provided, and copy the modified file to the bin directory. You can then extract metadata as you normally would.
The pdfsr.ini is in the directory samples\pdfini, and has the following structure:
<META> <TOTAL>total_item_number</TOTAL>, /metadata_tag_name datatype, </META>

Parameter total_item_number metadata_tag_name datatype

Description The total number of metadata tags that are listed. The metadata tag name used in the PDF files. The data type of the metadata element. The possible types are:
l KV_String l KV_Int4 l KV_DateTime

IDOL KeyView (12.12)

Page 68 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Parameter

Description l KV_ClipBoard l KV_Bool l KV_Unicode l KV_IEEE8 l KV_Other

For example:

<META> <TOTAL>4</TOTAL> /part_number /volume /purchase_date /customer </META>

INT4 INT4 DATETIME STRING

Skip Embedded Fonts
Text in PDF files sometimes contain embedded fonts. If you experience difficulties filtering embedded fonts, there are options in the API, the formats.ini file, and the FilterTest sample program that you can set to skip this type of text.
NOTE: If you choose to skip embedded fonts, none of the content that contains embedded fonts is included in the output.

Use the formats.ini File
To skip embedded fonts using the formats.ini file
l Set the following parameters:
[pdf_flags] skipembeddedfont=TRUE embedded_font_threshold=threshold
where threshold is a value between 0 and 100. A threshold of 100 skips all embedded font text; a threshold of 0 retains all embedded font text. Set skipembeddedfont to TRUE to enable the embedded_font_threshold parameter.
The default value of embedded_font_threshold is 100. if you set skipembeddedfont to true and do not specify the embedded_font_threshold parameter, Filter skips all embedded text.
When you use formats.ini to skip embedded fonts, you can also specify an embedded font threshold, which is an arbitrary percentage probability that the glyph in the embedded text maps to a character value in the output character set (ASCII, UTF-8, and so on).

IDOL KeyView (12.12)

Page 69 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
For example, if you specify a threshold of 75, embedded text glyphs that have a 75% or greater probability of correctly matching the character in the output character set are included in the output; glyphs that have a probability of less than 75% of matching the output character set are omitted from the output.
Use the Java API
To skip embedded fonts using the Java API, set the setSkipEmbeddedFont(boolean) method to true. For example: objFilter.setSkipEmbeddedFont(true); The FilterTest sample program demonstrates this method. See FilterTest, on page 98.
Control Hyphenation
There are two types of hyphens in a PDF document: l A soft hyphen is added to a word by a word processor to divide the word across two lines. This is a discretionary hyphen and is used to ensure proper text flow in justified text. l A hard hyphen is intentionally added to a word regardless of the word's position in the text flow. It is required by the rules of grammar or word usage. For example, compound words (such as three-week vacation and self-confident) contain hard hyphens.
By default, KeyView skips the source document's soft hyphens in the Filter output to provide more searchable text content. However, if you want to maintain the document layout, you can keep soft hyphens in the Filter output. To keep soft hyphens, you must enable the soft hyphen flag in formats.ini or in the API.
Use the formats.ini File
To keep soft hyphens by using the formats.ini file, set the following parameter: [pdf_flags] keepsofthyphen=TRUE
Use the Java API
To keep soft hyphens using the Java API, set the setKeepSoftHyphen(boolean) method to true. For example: objFilter.setKeepSoftHyphen(true); The FilterTest sample program demonstrates this method. See FilterTest, on page 98.
Filter Portfolio PDF Files
Portfolio PDF files contain subfiles and an ActionScript interface for navigating between them. You can use the extraction API to extract the subfiles. See Extract Subfiles from PDF Files, on page 51.

IDOL KeyView (12.12)

Page 70 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
Table Detection for PDF Files
PDF files often contain data presented in a tabular form. However, there is no information about the table stored within the PDF itself ­ the text is simply placed in an arrangement that looks like a table to the human eye. When this data is filtered, it can be very difficult to reconstruct the table. If table detection is enabled, KeyView attempts to recognize tables within PDF pages, and to reconstruct them before they are output. For each page of the document, KeyView outputs the contents of each table first, and then outputs all remaining text on the page. Micro Focus recommends that tab delimited output is also enabled when using table detection. This means that any tables detected appear in the output text in tab delimited format.
To enable table detection and tab delimited output l In the Java API, call the setTableDetection and setTabDelimited methods on the filter object, for example: filter.setTableDetection(true); filter.setTabDelimited(true); l In formats.ini, set the following parameters. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] TableDetection=TRUE TabDelimited=TRUE
NOTE: Table detection is only available with the pdf2sr reader. To enable this reader, set the following configuration parameter in formats.ini: [Formats] 200=pdf2
Filter Spreadsheet Files
Filter has special configuration options that allow greater control over the conversion of spreadsheet files.
Filter Worksheet Names
Normally, Filter does not extract worksheet names from a spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and extract worksheet names by adding the following lines to the formats.ini file: [Options] getsheetnames=1

IDOL KeyView (12.12)

Page 71 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Filter Hidden Text in Microsoft Excel Files
Normally, Filter does not filter hidden text from a Microsoft Excel spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and extract text from hidden rows, columns, and sheets from Excel spreadsheets by adding the following lines to the formats.ini file:
[Options] gethiddeninfo=1
Specify Date and Time Format on UNIX Systems
In Microsoft Excel you can choose to format dates and times according to the system locale. On Windows, KeyView uses the system locale settings to determine how these dates and times should be formatted. In other operating systems, KeyView uses the U.S. short date format (mm/dd/yyyy). You can change this by specifying the formats you wish to use in the formats.ini file.
To specify the system date and time format on UNIX systems l In the formats.ini file, specify the following options: o SysDateTime. The format to use when a cell is formatted using the system format including both the date and the time. o SysLongDate. The format to use when a cell is formatted using the system long date format. o SysShortDate. The format to use when a cell is formatted using the system short date format. o SysTime. The format to use when a cell is formatted using the system time format.
NOTE: These values cannot contain spaces.
For example, if you specify SysDateTime=%d/%m/%Y, dates and times are extracted in the following format:
28/02/2008 The format arguments are the same as those for the strftime() function. Refer to the following webpage for more information. http://linux.die.net/man/3/strftime
Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers
Numbers in Microsoft Excel files can now be extracted and written to the output without formatting. By default, numbers are extracted in the format specified by the Excel file (for example, General, Currency and Date). Spreadsheets might contain cells that have very large numbers in them. Excel displays the numbers in a scientific notation that rounds or truncates the numbers. To extract numbers without formatting, add the following options in the formats.ini file:

IDOL KeyView (12.12)

Page 72 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

[Options] ignoredefnumformats=1

Extract Microsoft Excel Formulas
When you filter a Microsoft Excel spreadsheet, KeyView extracts the value of each cell. The value of a cell might be calculated from a formula, but the formula is not included in the output unless you configure KeyView to include it. You can extract the cell value, the formula, or both. For example, if you choose to extract both the cell value and the formula, the output might look like this:
245 = SUM(B21:B26)
In this example, the calculated value from the cell is 245 and the formula from which the value is derived is SUM(B21:B26).
NOTE: Depending on the complexity of the formulas, enabling formula extraction might result in slightly slower performance.

To extract formulas l In the Java API, call the setShowFormulas method on the filter object, for example: filter.setShowFormulas(Filter.ShowFormulas.VALUES_AND_FORMULAS); l You can extract formulas by adding the following parameter to formats.ini: [Options] getformulastring=option where option is one of the following:

Option 0 1 2

Description Extract the cell value only. This is the default. Extract the formula only. Extract the formula and the cell value.

If a function in a formula is invalid, and option 1 or 2 is specified, only the calculated value is extracted.

Standardize Cell Formats
In Microsoft Excel you can format cell values. For example, the date "15/09/2021" could be formatted as "15 September 2021" or "2021-09-15". By default, KeyView extracts cell values with formatting, as they would appear in Excel. If you prefer, you can configure KeyView to standardize cell values.

IDOL KeyView (12.12)

Page 73 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

To standardize cell formats
l In the Java API, call the setStandardizeCellFormats method on the filter object, for example:
filter.setStandardizeCellFormats(true);
l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API).
[Options] StandardizeCellFormats=TRUE
When this feature is enabled, KeyView formats any cell where a number has been entered according to the following rules.

Numbers
Numbers are printed to the maximum length entered­that is, the full number put into the cell, without any rounding. Negative numbers are printed with a dash in front of them (as opposed to, for example, bracket form).
The following table provides some examples.

Example

Formatted value KeyView (standardized) output

Rounded number 600

600.1

Scientific notation 1.56E+04

15600

Fraction

17/20

0.85

Percentage

46%

0.46

Text
All text that is part of the format string is stripped, including currency symbols.
Dates
All dates are printed in full ISO-8601 format (that is YYYY-MM-DDTHH:MM:SS). There are two exceptions to this rule:
l Cases where the date format contains a time delta (that is, "[h]", "[m]", or "[s]"). In this case, the time is displayed as an interval, which is the number of days (where a day is defined as a period of 24 hours). The time is printed in the ISO-8601 time interval form, for example P1.234D.
l Cases where the absolute value of the cell is less than 1.0, and the date format contains only time components. In Excel, values between 0.0 and 1.0 correspond to the fictional date 190001-00, and are used to express times without an associated date. For example:

IDOL KeyView (12.12)

Page 74 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Value 0.5 0.5 1.5 1.5

Date format hh:mm:ss dd hh hh:mm:ss dd hh

KeyView output 12:00:00 1900-01-00 12:00:00 1900-01-01 12:00:00 1900-01-01 12:00:00

Filter Presentation Files to a Logical Reading Order
With some file formats, for example Microsoft PowerPoint presentations, the order of the text inside the file has no relation to the layout of the text on the page or screen. Recently modified text might appear at the end of a file, even though that text belongs at the beginning of the document. You can configure KeyView to process position information and sort the extracted text so that it is returned in the correct (reading) order.
NOTE: This feature supports Microsoft PowerPoint files only.
To enable logical reading order l In the Java API, call the method setFilterLogicalOrder on the Filter object. l In the formats.ini file, find the [Options] section, and set LogicalOrder to 1. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). For example: [Options] LogicalOrder=1
Related Topics l Filter PDF Files to a Logical Reading Order, on page 65
Filter HTML Files
KeyView can filter comments from HTML documents. To enable comment filtering, you must set a flag in the formats.ini file. The formats.ini file is in the install\OS\bin directory, where install is the Filter installation directory and OS is the name of the operating system.

IDOL KeyView (12.12)

Page 75 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

To enable filtering of comments from HTML files
1. Open the formats.ini file in a text editor. 2. Under [Options], set the following flag.
GetHTMLHiddenInfo=1
Filter XML Files
Filter SDK enables you to extract all or selected content from source XML files. You can specify the elements and attributes extracted from a document using the API or an INI file (see Configure Element Extraction for XML Documents, below). Filter detects the following XML formats:
l generic XML
l Microsoft Office 2003 XML (Word, Excel, and Visio)
l StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)
See File Format Detection, on page 234 for more information on format detection.
Configure Element Extraction for XML Documents
When filtering XML files, you can specify which elements and attributes are extracted according to the file's format ID or root element. This is useful when you want to extract only relevant text elements, such as abstracts from reports, or a list of authors from an anthology. A root element is an element in which all other elements are contained. In the XML sample below, book is the root element:
<book> <title>XML Introduction</title> <product id="33-657" status="draft">XML Tutorial</product> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter>
</book> For example, you could specify that when filtering files with the root element book, the element title is extracted as metadata, and only product elements with a status attribute value of draft are extracted. When you extract an element, the child elements within the element are also extracted. For example, if you extract the element chapter from the sample above, the child element para is also extracted. Filter SDK defines default element extraction settings for the following XML formats:

IDOL KeyView (12.12)

Page 76 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
l generic XML l Microsoft Office 2003 XML (Word, Excel, and Visio) l StarOffice/OpenOffice XML (text document, presentation, and spreadsheet) These settings are defined internally and are used when filtering these file formats; however, you can modify their values. In addition to the default extraction settings, you can also add custom settings for your own XML document types. If you do not define custom settings for your own XML document types, the settings for the generic XML are used.
Modify Element Extraction Settings
You can modify configuration settings for XML documents through either the API or the kvxconfig.ini file. Use the Java API You can use the Java API to modify the settings for the standard XML document types or add configuration settings for your own XML document types.
To modify settings 1. Declare an array of XMLConfigSet objects. 2. Create an instance of ConfigOption with the following arguments: a. Set the OptionType to CFG_SETXMLCONFIGINFO. b. Set the OptionValue to 0. c. Set OptionData to the array object. 3. Call the setConfigOption method, and pass in the ConfigOption instance. 4. Call a filter method. For example: XMLConfigSet[] XMLInfo; ConfigOption config=new ConfigOption(Filter.CFG_SETXMLCONFIGINFO, 0, XMLInfo); objFilter.setConfigOption(config);
Use an Initialization File You can use the initialization file to modify the settings for the standard XML document types or add configuration settings for your own XML document types.
To modify settings 1. Modify the kvxconfig.ini file. 2. Use the initialization file when processing the XML file. See Modify Element Extraction Settings in the kvxconfig.ini File, on the next page. The Java sample program FilterTest demonstrates how to use the initialization file in the filtering process. See Sample Programs, on page 89.

IDOL KeyView (12.12)

Page 77 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Modify Element Extraction Settings in the kvxconfig.ini File
The kvxconfig.ini file contains default element extraction settings for supported XML formats. The file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. For example, the following entry defines extraction settings for the Microsoft Visio 2003 XML format:
[config3] eKVFormat=MS_Visio_XML_Fmt szRoot= szInMetaElement=DocumentProperties szExMetaElement=PreviewPicture szInContentElement=Text szExContentElement= szInAttribute=
The following options are available:

Configuration Option Description

eKVFormat

The format ID as detected by the KeyView detection module. This determines the file type to which these extraction settings apply. See File Format Detection, on page 234 for more information on format ID values.
If you are adding configuration settings for a custom XML document type, this is not defined.

szRoot

The file's root element. When the format ID is not defined, the root element is used to determine the file type to which these settings apply.
To further qualify the element, specify its namespace. See Specify an Element's Namespace and Attribute, on the next page.

szInMetaElement

The elements extracted from the file as metadata. All other elements are extracted as text.
Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, on the next page.

szExMetaElement

The child elements in the included metadata elements that are not extracted from the file as metadata. For example, the default extraction settings for the Visio XML format extract the DocumentProperties element as metadata. This element includes child elements such as Title, Subject, Author, Description, and so on. However, the child element PreviewPicture is defined in szExMetaElement because it is binary data and should not be extracted.
You cannot exclude any metadata elements from the output for StarOffice files. All metadata is extracted regardless of this setting.
Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, on the next page.

IDOL KeyView (12.12)

Page 78 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Configuration Option Description

szInContentElement

The elements extracted from the file as content text. Enter an asterisk (*) to extract all elements including child elements.
Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, below.

szExContentElement

The child elements in the included content elements that are not extracted from the file as content text.
Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, below.

szInAttribute

The attribute values extracted from the file. If attributes are not defined here, attribute values are not extracted.
Enter the namespace (if used), element name, and attribute name in the following format:

namespace:elementname@attributename For example:

microfocus:division@name Separate multiple entries with commas.

Specify an Element's Namespace and Attribute
To further qualify an element, you can specify that the element exist in a certain namespace and/or contain a specific attribute. To define the namespace and attribute of an element, enter the following:
ns_prefix:elemname@attribname=attribvalue
NOTE: You must enclose attribute values that contain spaces in quotation marks.
For example, the entry bg:language@id=xml extracts a language element in the namespace bg that contains the attribute name id with the value of "xml". This entry extracts the following element from an XML file:
<bg:language id="xml">XML is a simple, flexible text format derived from SGML</bg:language> but does not extract:
<bg:language id="sgml">SGML is a system for defining markup languages.</bg:language> or
<adv:language id="xml">The namespace should be a Uniform Resource Identifier (URI).</adv:language>

IDOL KeyView (12.12)

Page 79 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
Add Configuration Settings for Custom XML Document Types
You can define element extraction settings for custom XML document types by adding the settings to the kvxconfig.ini file. For example, for files that contain the root element microfocusxml, you can add the following section to the end of the initialization file:
[config101] eKVFormat= szRoot=microfocusxml szInMetaElement=dc:title,dc:meta@title,dc:meta@name=title szExMetaElement=
szInContentElement=microfocus:division@name=keyview,microfocus:division@name=idol,p@ style="Heading 1" szExContentElement= szInAttribute=microfocus:division@name The custom extraction settings must be preceded by a section heading named [configN], where N is an integer starting at 100 and increasing by 1 for each additional file type, as in [config100], [config101], [config102], and so on. The default extraction settings for the supported XML formats are numbered config0 to config99. Currently only 0 to 6 are used. Since a custom XML document type is not recognized by the KeyView detection module, the format ID is not defined. The file type is identified by the file's root element only. If a custom XML document type is not defined in the kvxconfig.ini file or by the setConfigOption method, then the default extraction settings for a generic XML document are used.
Configure Headers and Footers
You can configure custom header and footer tags for word processing and spreadsheet documents by editing the formats.ini file.
To configure headers and footers 1. Open the formats.ini file. 2. In the [Options] section, add the following items:
header_start_tag=HeaderStart header_end_tag=HeaderEnd footer_start_tag=FooterStart footer_end_tag=FooterEnd For example:
header_start_tag=<myHeaderTag> header_end_tag=</myHeaderTag> footer_start_tag=<myFooterTag> footer_end_tag=</myFooterTag>

IDOL KeyView (12.12)

Page 80 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

NOTE: You must encode custom tags in UTF-8.

Error Messages

When a KeyView exception is thrown, it might be caused by one of the following errors.

Exception KVERR_Success KVERR_DLLNotFound KVERR_OutOfCore KVERR_processCancelled KVERR_badInputStream KVERR_badOutputType KVERR_General KVERR_FormatNotSupported KVERR_PasswordProtected
KVERR_ADSNotFound KVERR_AutoDetFail KVERR_AutoDetNoFormat KVERR_ReaderInitError KVERR_NoReader KVERR_ CreateOutputFileFailed
KVERR_CreateTempFileFailed KVERR_ ErrorWritingToOutputFile KVERR_CreateProcessFailed KVERR_WaitForChildFailed

Description Function completed successfully. A DLL or shared library was not found. Memory allocation failure. Callback function returns FALSE. Invalid or corrupt input stream. Invalid output is requested. General error. File format is not supported. File is encrypted or password-protected. KeyView only supports secure PST, NSF, and ZIP files. Adobe Document Server not found. This error is obsolete. Autodetect error. Unable to detect file format. Error initializing the reader. No reader available for this format. Unable to create output file. If the overwrite flag in setOverWrite is FALSE and a subfile has the same name as a file in the target path, this error is generated. Unable to create temporary file. Error writing to output file.
Error creating a child process. Wait for child process failed.

IDOL KeyView (12.12)

Page 81 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Exception

Description

KVERR_ChildTimeOut

Child process hung/timed out.

KVERR_ArchiveFileNotFound Attempt to extract nonexistent file.

KVERR_ArchiveFatalError

Fatal error processing an archive file.

KVError_OpenStreamFailure = Failed to open a stream during out-of-process filtering. KVERR_ArchiveFatalError +1

KVError_ InterfaceFunctionNotFound

An interface function was not found during out-of-process filtering.

KVError_InputFileNotFound Could not find the input file during out-of-process filtering.

KVError_ OpenOutputFileFailed

Could not open the output file during out-of-process filtering.

KVError_MemoryLeak

Memory leak occurred during out-of-process filtering.

KVError_MemoryOverwrite

Memory overwrite occurred during out-of-process filtering.

KVError_GPF

Exception occurred during out-of-process filtering.

KVError_OopCore

Memory dump was generated in a child process during out-ofprocess filtering.

KVError_KVoopLogFailed

Creation of out-of-process error log failed.

KVError_OverNestedFileLimit

The container file has more than the allowable number of child documents. One or more child documents were not converted. Currently, this is not used.

KVError_PSTAccessFailed

The PST file could not be converted. This error might be returned when a call to extOpenDocument returns NULL for one of the following reasons:
l Microsoft Outlook client is not installed
l Microsoft Outlook client is installed, but is not the default email client
l Microsoft Outlook client is installed, but is not configured correctly
l PST file is corrupt
l PST file is read-only (PST files must allow read and write access)
l MAPI call fails
l The bit editions of Microsoft Outlook do not match the bit editions of the KeyView software.
For example, if 32-bit KeyView is used, 32-bit Outlook

IDOL KeyView (12.12)

Page 82 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Exception
KVError_PasswordRequired KVError_InvalidArgs KVError_OutputFileExists
KVError_ReaderUsageDenied
KVError_OopBadConfig KVError_OopBrokenPipe KVError_OopPipeOEF KVError_IPCTimeOut KVError_ InvalidOopDriverSignature KVError_ InvalidOopServiceSignature

Description
must be installed. If 64-bit KeyView is used, 64-bit Outlook must be installed.
To open the file, credentials must be provided. This error might be returned when a call to extOpenDocument returns NULL.
The input argument or structure is invalid. This is generated by the File Extraction APIs.
A file with the same name already exists in the output directory. This error is generated when extracting a subfile from a container file with the setOverWrite flag set to FALSE, and a file by the same name already exists in the output directory.
The current license key does not enable the document reader required to filter the file. This error might be returned when a call to extOpenDocument returns NULL.
Some document readers are considered advanced features and are licensed separately from the KeyView SDK (for example, the PST and MBX readers). Contact your Micro Focus sales representative to get an updated license key
Information in the kvxconfig.ini file is incomplete and cannot be used to filter the XML file.
Data was not transferred between the parent and child processes during out-of-process filtering because either the parent or child failed.
Data was not transferred between the parent and child processes during out-of-process filtering because the parent process was shutdown.
Either the parent or child process is waiting for a reply or request during out-of-process filtering.
A client sent a request to the File Extraction out-of-process server, but context driver does not exist on the server.
A client sent a request to a File Extraction out-of-process server that does not exist.
If this error is generated on the call to fpClose(), it can be ignored.

IDOL KeyView (12.12)

Page 83 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
Tab Delimited Output for Embedded Tables
You can use KeyView to convert embedded tables in Word Processing documents (for example, Microsoft Word documents), and tables detected by Optical Character Recognition (OCR), to tabdelimited form. This inserts a tab character between each cell, and a line break between each row. Tab and line break characters in the cells are replaced with spaces.
To enable tab delimited output for embedded tables l In the Java API, call the setTabDelimited method on the filter object, for example: filter.setTabDelimited(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] TabDelimited=TRUE
Exclude Japanese Guide Text
This option prevents output of Japanese phonetic guide text when Microsoft Excel (.xlsx) files are processed.
To prevent output of Japanese phonetic guide text l In the Java API, call the setNoPhoneticGuides method on the filter object, for example: filter.setNoPhoneticGuides(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] NoPhoneticGuides=TRUE
Source Code Identification
When KeyView auto-detects a file that contains source code, it can attempt to identify the programming language that it is written in. When you do not enable source code identification, files containing source code may be identified as ASCII text files, causing the application to treat them in the same way as ordinary text. However, in many instances, it can be useful to route these files elsewhere or filter them out. For example, indexing source code into an IDOL index has minimal value and could bloat the engine with terms

IDOL KeyView (12.12)

Page 84 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

that are of no use in retrieval. You can use source code identification to identify files containing a particular programming language as a more specific format.

NOTE: Source code identification is available only on certain platforms (see source code identification in the platform differences section).

You can set source code identification to different levels.

Option KVSOURCECODE_OFF KVSOURCECODE_ENABLED
KVSOURCECODE_EXTENDED

Description
Do not enable source code identification.
Enable source code identification for the most common source code formats.
Enable source code identification for all supported source code formats. This option might lead to false positives in some cases (for example, a C++ file might get identified as a rarer format).

For the complete list of source code formats supported for both options, see Supported Formats, on page 105.
To configure source code identification l In the Java API, call the setSourceCodeDetection method on the filter object, for example: filter.setSourceCodeDetection(Filter.SourceCodeDetection.ENABLED); l In formats.ini, set the following parameter to the appropriate level. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] SourceCodeDetection=KVSOURCECODE_ENABLED

Optical Character Recognition
When processing raster image files, KeyView can perform Optical Character Recognition (OCR) to attempt to filter text that might be visible in the image. If text is detected to form part of a table, it will be filtered in the same way as tables in Word Processing documents.
NOTE: KeyView performs OCR only on standalone raster files, not on images embedded inside other documents. For embedded images, you must first extract the images by using the Extract Images option.
NOTE: OCR is available only on certain platforms (see Optical Character Recognition in the platform differences section).
If your license includes OCR, it is enabled by default.

IDOL KeyView (12.12)

Page 85 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API
To enable or disable OCR l Call the setOcr method of the Filter class.
Optimize OCR Performance
The default settings for OCR attempt to detect as much text as possible. For example, KeyView attempts to detect text in multiple languages and alphabets, and rotated text in increments of 90 degrees from upright. This increases the amount of text that can be detected, prioritizing recall over processing time. If you know what you will be processing in advance, you can specify OCR options to improve performance. To configure OCR through the Java API, call the method filter.setOcr. For example, if the input is scanned pages that contain only English or only Japanese text, the following configuration could result in a performance improvement. However, it may fail to recognize text in some images such as landscape pages where the text is not upright.
filter.setOcr(new OCROptions("en ja", OCROptions.Orientation.UPRIGHT, OCROptions.DetectAlphabet.LISTED));
Languages
OCR supports many different languages. For a list of supported languages, see OCR Supported Languages, on page 278. If you know that your files only contain text in a certain language or a small number of languages, you can improve both processing speed and accuracy by configuring OCR with this information.
Orientation
By default, OCR attempts to detect text that appears rotated, in 90-degree increments from upright. This means that KeyView can filter text from an image, even if it has been rotated or was scanned upside-down. If you know that your images contain only upright text, you can improve processing speed by disabling this feature.
Alphabet Detection
Sometimes, if you do not know the language of the input text in advance of processing, you might specify multiple languages. OCR requires more processing time for each additional language, especially when the languages span multiple alphabets (Latin, Cyrillic, Chinese, Arabic, and so on). You can configure OCR to detect the alphabet for each image, before attempting to recognize characters. You can choose one of the following options.
l Off. By default, OCR does not detect the alphabet. Use this option when you have specified a single language or multiple languages that use the same alphabet. Micro Focus also recommends this option when you expect an image to use multiple alphabets (for example, when there is English and Arabic text on the same page).

IDOL KeyView (12.12)

Page 86 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

l Listed. OCR detects the alphabet, but only considers alphabets that are represented in your chosen list of languages. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. For example, if you set languages="en ja ko" (English, Japanese, and Korean) and OCR detects the Latin alphabet, OCR ignores the Japanese and Korean languages. Micro Focus recommends using this option when each source image uses a single alphabet, and the list of possible languages is known but spans multiple alphabets.
l Any. OCR detects the alphabet that is used, and considers all alphabets. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. If none of your chosen languages match the detected alphabet, OCR does not recognize characters and there is no output. Micro Focus recommends using this option instead of Listed when you want to reject images that do not match any of the specified languages.
If your input contains Chinese, Japanese, or Korean text with some ASCII characters, you can safely set this parameter to any of the available options, because OCR includes ASCII characters for those languages.

Configure the Proxy for RMS

When KeyView needs to access contents that are protected by the Microsoft Rights Management System (RMS), it must make HTTP requests. By default, KeyView uses the system proxy settings for these requests.
To use different proxy settings, you can configure them in the [RMS] section of the formats.ini configuration file. The following table describes the available options.

Parameter UseSystemProxy
ProxyHost ProxyPassword

Description
Whether to obtain details about your HTTP proxy from the system. By default, this parameter is set to TRUE, which means:
l On Microsoft Windows platforms, KeyView reads the proxy settings that are configured in the Windows Control Panel.
l On Linux, KeyView reads the proxy settings from environment variables such as HTTP_PROXY and HTTPS_PROXY.
You can use UseSystemProxy instead of setting the other proxy parameters (ProxyHost, ProxyPort, ProxyUsername, and ProxyPassword). When UseSystemProxy is set to TRUE, you must remove these other parameters from your configuration.
Set UseSystemProxy to FALSE to use different proxy settings. In this case you must set at least ProxyHost and ProxyPort.
The host name or IP address of the proxy server.
The password to use to authenticate with the proxy server.

IDOL KeyView (12.12)

Page 87 of 280

Filter SDK Java Programming Guide Chapter 4: Use the Filter API

Parameter ProxyPort
ProxyUsername

Description
The port of the proxy server to use to access the repository. This port must be greater than 0, and less than 65535.
The user name to use to authenticate with the proxy server.

Document Restrictions
Some applications, and corresponding file formats, allow users to restrict the ways in which a document can be used. For example, you might be able to read a document but additional credentials (such as a password) could be required to modify the document content, add comments, or print the document. The restrictions might not be enforced by encryption, but instead rely on any software that accesses the file to respect the restrictions that have been set.
TIP: These restrictions are not file system permissions (for example, making a file read-only). They are restrictions applied by the software package that created the file.
KeyView can report whether a document is protected by write restrictions, for the following file formats. A write restriction is defined as any restriction, enforced by a password, that prevents a user from editing the document content.
l Adobe Portable Document Format (.PDF) l Microsoft Word (.DOCX) l Microsoft Excel (.XLSX) l Microsoft PowerPoint (.PPTX)
To determine whether a document is protected by restrictions l In the Java API, use the method getRestrictions on the filter object. For example: Restrictions restrictions = filter.getRestrictions("document.docx");

IDOL KeyView (12.12)

Page 88 of 280

Chapter 5: Sample Programs

This section describes the sample programs provided with Filter SDK.

· Introduction

89

· ExtractFilter

90

· FilterFileByChunk

92

· FilterFileToFile

93

· FilterFileToStream

94

· FilterStreamByChunk

95

· FilterStreamToFile

96

· FilterStreamToStream

97

· FilterTest

98

Introduction
The following Java sample programs are provided: l ExtractFilter l FilterFileByChunk l FilterFileToFile l FilterFileToStream l FilterStreamByChunk l FilterStreamToFile l FilterStreamToStream l FilterTest
The source code for the programs is in the directory javaapi/sample. Included alongside the source code are compiled .class files, and the following Batch (.bat) and C Shell (.csh) files that help run the corresponding program: FilterFileToFile.bat (.csh) FilterStreamToStream.bat (.csh) FilterFileToStream.bat (.csh) FilterStreamToFile.bat (.csh) FilterFileByChunk.bat (.csh) FilterStreamByChunk.bat (.csh)

IDOL KeyView (12.12)

Page 89 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

The sample programs pass license information to KeyView through the Filter constructor. This is the method recommended by Micro Focus. Before the sample code can be compiled, you must replace the placeholders YOUR_LICENSE_ORGANIZATION and YOUR_LICENSE_KEY with your license information.
The compiled .class files that are supplied in the SDK have an embedded trial license, which expires approximately five months after release. If the environment variables KV_SAMPLE_PROGRAM_ LICENSE_ORGANIZATION and KV_SAMPLE_PROGRAM_LICENSE_KEY are set then those values are used instead, so that you can use the programs after the embedded trial license has expired, and test or troubleshoot with your own license.
NOTE: The sample programs that demonstrate the use of an input stream show filtering from a java.io.InputStream object. In KeyView version 12.9 and later, the stream methods are overloaded to allow you to pass a com.verity.api.SeekableInputStream implementation into KeyView. Micro Focus recommends this option, as it allows KeyView to seek about in the file, only reading the parts it needs to read.
If you do need to use a Java InputStream, and you know the stream length, using the method overload that passes in the size might allow KeyView to avoid caching the whole file.

ExtractFilter
The ExtractFilter program demonstrates the File Extraction interface. The FilterTest sample program demonstrates the functionality of the Filtering interface. See FilterTest, on page 98. The ExtractFilter program demonstrates the following functionality:
l opens a document l extracts subfiles from a document l repeats subfile extraction until all subfiles are extracted l enables you to specify the command-line options listed in the following table
To run ExtractFilter 1. Add the location of the javaapi\KeyView.jar file, the javaapi\sample directory, and the Filter bin directory to the CLASSPATH environment variable. 2. Type the following: java -Djava.library.path=bin_directory ExtractFilter [options] bin_directory input_file output_dir where, bin_directory is the path to the Filter bin directory. options is one or more of the options listed in the following table. input_file is the path and file name of the source file. output_dir is the path of the folder to write the output files to. This folder does not have to exist.

IDOL KeyView (12.12)

Page 90 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

Options for ExtractFilter Sample Program

Option

Description

-extonly

Extracts the subfiles from a source file but does not filter the files after extraction.

-ext-fbody Extracts the formatted version of the message body (HTML or RTF) from mail files when possible.

-source-cs charset

Sets the character set of the source file.
charset is a character set defined in the Filter class. See Coded Character Sets, on page 215.

-target-cs charset

Sets the character set of the output file.
charset is a character set defined in the Filter class. See Coded Character Sets, on page 215.

-little-end Sets the byte order for Unicode text to Little Endian.

-is

Sets the input as a stream. The default is file.

-os

Sets the output as a stream. The default is file.

-ip

Runs file extraction in the same process as the calling application (in process).

See Run Filter In Process, on page 27.

-open-user username
-open-pass password
-openidfile idfile
-opencreateroot

Specifies the user name used to open a protected PST file. Specifies the password used to open a protected PST file. Specifies the user ID file used to open a protected PST file.
Creates a root directory on which a hierarchy can be based. See Create a Root Node, on page 37.

-ext-nodir Specifies the subfile directory structure is not created.

-extnoheader

Excludes mail header information from extracted message body text file. See Exclude Metadata from the Extracted Text File, on page 43.

-meta outfile

Extracts default mail metadata and writes it to a file. See Extract Mail Metadata, on page 38.

IDOL KeyView (12.12)

Page 91 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

FilterFileByChunk
The FilterFileByChunk program filters an input file to an output file using the Java API method doFilterChunk(). The method filters an input source and returns one chunk of output data. The program calls the method repeatedly until the entire file is processed.

Run FilterFileByChunk on Windows

To run FilterFileByChunk on Windows 1. In the FilterFileByChunk.bat file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following:
filterfilebychunk inputfile outputfile where,
inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

Run FilterFileByChunk on UNIX

To run FilterFileByChunk on UNIX 1. In the FilterFileByChunk.csh file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following:
/FilterFileByChunk.csh inputfile outputfile where,
inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

IDOL KeyView (12.12)

Page 92 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

FilterFileToFile
The FilterFileToFile program filters an input file to an output file using Java API methods in Filter. It demonstrates the following functions:
l filters an input file to an output file.
l extracts the character set if it can be determined by the document reader.
l extracts file format information (document type, format, version, and so on) if available in the source document.
l extracts metadata if available in the source document. This program extracts all the metadata from the document, but only displays the first element of metadata.

Run FilterFileToFile on Windows

To run FilterFileToFile on Windows 1. In the FilterFileToFile.bat file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following:
filterfiletofile inputfile outputfile where,
inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

Run FilterFileToFile on UNIX

To run FilterFileToFile on UNIX 1. In the FilterFileToFile.csh file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following:
./FilterFileToFile.csh inputfile outputfile

IDOL KeyView (12.12)

Page 93 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

where,
inputfile is the path and file name of the source file.
outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

FilterFileToStream
The FilterFileToStream program filters an input file to an output stream using Java API methods in Filter.

Run FilterFileToStream on Windows

To run FilterFileToStream on Windows 1. In the FilterFileToStream.bat file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following:
filterfiletostream inputfile where,
l inputfile is the path and file name of the source file. l The generated text is output to the current DOS prompt.

Run FilterFileToStream on UNIX

To run FilterFileToStream on UNIX 1. In the FilterFileToStream.csh file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following:
./FilterFileToStream.csh inputfile where,

IDOL KeyView (12.12)

Page 94 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

l inputfile is the path and file name of the source file. l The generated text is output to the current console (standard out).

FilterStreamByChunk
The FilterStreamByChunk program filters an input stream to an output stream using the Java API method doFilterChunk(). The method filters an input source and returns one chunk of output data. The program calls the method repeatedly until the entire output buffer is processed.
NOTE: In KeyView version 12.9 and later, Micro Focus recommends that you implement a com.verity.api.SeekableInputStream. See Input/Output Operations, on page 23.

Run FilterStreamByChunk on Windows

To run FilterStreamByChunk on Windows 1. In the FilterStreamByChunk.bat file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following:
filterstreambychunk inputfile outputfile where,
l inputfile is the path and file name of the source file. l outputfile is the path and file name of the generated file. If a path is not specified, the file
is output to the current directory.

Run FilterStreamByChunk on UNIX

To run FilterStreamByChunk on UNIX 1. In the FilterStreamByChunk.csh file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following:

IDOL KeyView (12.12)

Page 95 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

./FilterStreamByChunk.csh inputfile outputfile where,
l inputfile is the path and file name of the source file. l outputfile is the path and file name of the generated file. If a path is not specified, the file
is output to the current directory.

FilterStreamToFile
The FilterStreamToFile program filters an input stream to an output file using Java API methods in Filter.
NOTE: In KeyView version 12.9 and later, Micro Focus recommends that you implement a com.verity.api.SeekableInputStream. See Input/Output Operations, on page 23.

Run FilterStreamToFile on Windows

To run FilterStreamToFile on Windows 1. In the FilterStreamToFile.bat file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following:
filterstreamtofile inputfile outputfile where,
inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

IDOL KeyView (12.12)

Page 96 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

Run FilterStreamToFile on UNIX

To run FilterStreamToFile on UNIX 1. In the FilterStreamToFile.csh file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following:
./FilterStreamToFile.csh inputfile outputfile where,
inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

FilterStreamToStream
The FilterStreamToStream program filters an input stream to an output stream using Java API methods in Filter. It demonstrates the following functions:
l creates an input and an output stream. Filters the input stream to the output stream. l extracts file format information (document type, format, version, and so on) if available in the
source document. l extracts metadata if available in the source document. This program extracts all the metadata
from the document, but only displays the first element of metadata.
NOTE: In KeyView version 12.9 and later, Micro Focus recommends that you implement a com.verity.api.SeekableInputStream. See Input/Output Operations, on page 23.

IDOL KeyView (12.12)

Page 97 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

Run FilterStreamToStream on Windows

To run FilterStreamToStream on Windows 1. In the FilterStreamToStream.bat file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following:
filterstreamtostream inputfile where,
l inputfile is the path and file name of the source file. l The generated text is output to the current DOS prompt.

Run FilterStreamToStream on UNIX

To run FilterStreamToStream on UNIX 1. In the FilterStreamToStream.csh file, set the following variables.

INSTALL_DIR PLATFORM

The absolute path of the KeyView Filter SDK installation directory. The platform name.

2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following:
./FilterStreamToStream.csh inputfile where,
l inputfile is the path and file name of the source file. l The generated text is output to the current console (standard out).

FilterTest
The FilterTest program demonstrates most of the Filtering methods available in the Java API. It filters an input document to an output document and enables you to specify command-line options. The command-line options are listed in Options for FilterTest Sample Program, on the next page.

IDOL KeyView (12.12)

Page 98 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

To run FilterTest
1. Add the location of the javaapi\KeyView.jar file, the javaapi\sample directory, and the Filter bin directory to the CLASSPATH environment variable.
2. Type the following command line:
java -Djava.library.path=bin_directory FilterTest [options] bin_directory input_file output_file where,
l bin_directory is the path to the Filter bin directory.
l options is one or more of the options listed in Options for FilterTest Sample Program, below.
l input_file is the path and file name of the source file.
l output_file is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

Options for FilterTest Sample Program

Option

Description

-is

Sets the input as a stream. The default is file.

-os

Sets the output as a stream. The default is file.

-chunk

Filters an input source and returns one chunk of output data. The program calls the filter method repeatedly until the entire output buffer is processed.

-docformat filename

Extracts the file format information and writes it to a file. filename is the name of the file to which the format information is written.

-summary filename

Extracts the metadata and writes it to a file.
filename is the name of the file to which the metadata is written. See Extract Metadata, on page 59.

-getTargetCS Extracts the character set used in the output file to the standard output.

-c charset

Sets the character set of the output file. Use the option -getTargetCS to determine whether the target character set specified is used in the output file.
charset is a character set defined in the Filter class. See Coded Character Sets, on page 215.

-cs charset

Sets the character set of the source file.
charset is a character set defined in the Filter class. See Coded Character Sets, on page 215.

-rc character Sets a replacement character for characters that cannot be mapped. The

IDOL KeyView (12.12)

Page 99 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

Options for FilterTest Sample Program, continued

Option

Description

default is a question mark (?).

-ip

Runs Filter in the same process as the calling application (in process). See

Run Filter In Process, on page 27.

-ooplog

Enables error logging. See Enable or Disable Error Logging, on page 56. Error logs are not generated when in-process filtering is enabled.

-oopmem

Enables the memory trace system in the error logs. The memory trace system reports memory leaks and memory overwrites in the log file. See Report Memory Errors, on page 57. Error logs are not generated when in-process filtering is enabled.

-hf

Extracts headers and footers, as well as the body text.

-hftags

Puts tags around header and footer data.

-lo

Specifies that PowerPoint PPT97 and PPTX file text data is output in a logical

reading order.

-lsbmsb

Uses LSBMSB byte order for Unicode text. LSBMSB is the "Least Significant Byte Most Significant Byte," or in other words, the byte order for Little Endian systems.

-msblsb

For Unicode text, uses MSBLSB byte order. MSBLSB is the "Most Significant Byte Least Significant Byte," or in other words, the byte order for Big Endian systems.

-bomarker

Generates the byte order marker for Unicode text.

-nodefcsconv

Prevents default conversion of document character encoding. See Prevent the Default Conversion of a Character Set, on page 64.

-x xmlconfigfile

Filters an XML file using customized extraction settings defined in the kvxconfig.ini file. If you do not enter the full path to the INI file, the program looks for the file in the current working directory. See Filter XML Files, on page 76.

-z tempdirectory

Specifies a temporary directory where temporary files generated by the filtering process are stored. The default is the current working directory.
On Windows systems, there is a 64 K size limit to the temp directory. Once the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you might receive an error message.

-ps password

Specifies a password to open a password-protected PST file. This uses the Container API which is obsolete.

-pdflorder

Specifies that PDF files are output in a logical reading order. The parameter

IDOL KeyView (12.12)

Page 100 of 280

Filter SDK Java Programming Guide Chapter 5: Sample Programs

Options for FilterTest Sample Program, continued

Option

Description

orderFlag

orderFlag is one of the following: l ltr--left-to-right paragraph direction. l rtl--right-to-left paragraph direction. l auto--The PDF filter determines the paragraph direction (left-to-right or right-to-left) for each PDF page, and then sets the direction accordingly. l raw--Unstructured paragraph flow.
See Filter PDF Files, on page 64.

-rm

If you set this option, text that was deleted from a document with revision

tracking enabled is extracted from the document and included in the filtered

output. See Extract Tracked Deleted Text, on page 64.

-embeddedfont If you set this option, text that contains embedded fonts is not filtered from PDF documents. See Filter PDF Files, on page 64.

IDOL KeyView (12.12)

Page 101 of 280

Part III: Appendixes
This section lists supported formats, supported character sets, and redistributed files, and provides information on format detection and developing a custom document reader.
l Supported Formats l Document Readers l Platform Differences l Character Sets l Extract and Format Lotus Notes Subfiles l File Format Detection l List of Required Files for Redistribution l Develop a Custom Reader l Password Protected Files l OCR Supported Languages

IDOL KeyView (12.12)

Page 102 of 280

Appendix A: Supported Formats

This section lists the file formats that KeyView can detect.

· Key to Supported Formats Table

103

· Supported Formats

105

· File Classes

171

Key to Supported Formats Table

The supported formats table includes the following information:

Column Format Name
Number Category Description MIME Type Extension

Description
The format name that is returned by KeyView format detection. l In the C API, these values are defined in the ENdocFmt enumeration in adDocFmt.h. l In the .NET API these values are defined in the Autonomy.API.Filter.DocFormat enumeration. l In the Java API these values are defined in the com.verity.api.DocFormat enumeration. l In the C++ API these values are defined in keyview::Format, used in DetectionInfo which is returned by Session::detect().
The format number that is returned by KeyView format detection. This is the value associated with the Format Name in the relevant enumeration.
This value is used in the KeyView configuration file formats.ini to specify the reader to use to filter, export, or view the format. Several formats might have the same category value.
A short description of the file format.
The MIME type (if any).
A list of common file extensions for the file format.

NOTE: This is not a complete list of file extensions. KeyView does not distinguish between file types based on their extension. Instead, it detects the file format based on the file content. This is more reliable because content cannot always be predicted from the file extension, and because some file extensions are associated with multiple formats.

IDOL KeyView (12.12)

Page 103 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

File Class

The KeyView file class.
l In the C API, these values are defined in the ENdocClass enumeration in adinfo.h.
l In the .NET API these values are defined in the Autonomy.API.Filter.DocClass enumeration.
l In the Java API these values are defined in the com.verity.api.DocClass enumeration.
l In the C++ API these values are defined in keyview::Category, used in DetectionInfo which is returned by Session::detect().

IDOL KeyView (12.12)

Page 104 of 280

Supported Formats

Format Name Reserved__Fmt Unknown_Fmt AES_Multiplus_Comm_Fmt ASCII_Text_Fmt MSDOS_Batch_File_Fmt Applix_Alis_Fmt BMP_Fmt
CT_DEF_Fmt
Corel_Draw_Fmt CGM_ClearText_Fmt CGM_Binary_Fmt CGM_Character_Fmt Word_Connection_Fmt COMET_TOP_Word_Fmt
CEOwrite_Fmt DSA101_Fmt DCA_RFT_Fmt CDA_DDIF_Fmt DG_CDS_Fmt Micrografx_Draw_Fmt Data_Point_VistaWord_Fmt DECdx_Fmt Enable_WP_Fmt EPSF_Fmt

Number -1 0 1 2 3 4 5

Category -1 0 1 2 2 3 4

Description
Multiplus (AES) Plain Text file MS-DOS Batch File Applix Asterix Windows Bitmap Image (BMP)

MIME Type
text/plain application/x-bat image/bmp

6

5

7

6

8

8

9

8

10

8

11

9

12

10

13

11

14

12

15

13

16

14

17

16

18

18

19

19

20

20

21

21

22

22

Convergent Technologies DEF Comm. Format

CorelDRAW (up to version 13/X3) application/coreldraw

Computer Graphics Metafile (CGM)

Computer Graphics Metafile (CGM) image/cgm

Computer Graphics Metafile (CGM)

Word Connection

Nixdorf COMET TOP Financial Accounting software

CEOwrite

DSA101 (Honeywell Bull)

IBM DCA-RFT (Revisable Form)

application/dca-rft

CDA / DDIF

DG Common Data Stream (CDS)

Windows Draw (Micrografx)

image/x-mgx-dsf

Vistaword

DEC WPS Plus DX format

application/dec-dx

Enable Word Processing

application/ewp

Encapsulated PostScript

application/postscript

IDOL KeyView (12.12)

Extension
PTF TXT BAT AX BMP
CDR CGM CGM CGM CN
CW
RFT, DC DDIF CDS DRW DV DX WPF EPS

File Class AutoDetNoFormat AutoDetNoFormat adWORDPROCESSOR adWORDPROCESSOR adEXECUTABLE adWORDPROCESSOR adRASTERIMAGE
adWORDPROCESSOR

Readers
afsr afsr axsr bmpsr, kpbmprdr cdsr

adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR

kpcdrrdr kpcgmrdr kpcgmrdr kpcgmrdr stringssr

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE,

stringssr stringssr dcasr
stringssr
stringssr
stringssr kpepsrdr

Page 105 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
Preview_EPSF_Fmt
MS_Executable_Fmt G31D_Fmt GIF_87a_Fmt
GIF_89a_Fmt
HP_Word_PC_Fmt IBM_1403_LinePrinter_Fmt IBM_DCF_Script_Fmt IBM_DCA_FFT_Fmt Interleaf_Fmt GEM_Image_Fmt IBM_Display_Write_Fmt Sun_Raster_Fmt Ami_Pro_Fmt Ami_Pro_StyleSheet_Fmt MORE_Fmt Lyrix_Fmt MASS_11_Fmt MacPaint_Fmt MS_Word_Mac_Fmt
SmartWare_II_Comm_Fmt MS_Word_Win_Fmt
Multimate_Fmt Multimate_Fnote_Fmt Multimate_Adv_Fmt Multimate_Adv_Fnote_Fmt

Number Category Description

MIME Type

23

22

24

23

25

24

26

25

27

25

28

26

29

27

30

28

31

29

32

30

33

31

34

32

35

33

36

35

37

35

38

36

39

37

40

38

41

39

42

40

43

41

44

42

45

43

46

43

47

43

48

43

Encapsulated PostScript

application/postscript

MSDOS/Windows executable

application/x-msdownload

CCITT G3 1D

Graphics Interchange Format (GIF87a)

image/gif

Graphics Interchange Format (GIF89a)

image/gif

HP Word PC

IBM 1403 Line Printer

DCF Script

DCA-FFT (IBM Final Form)

text/x-ibm-fft

Interleaf

GEM Bit Image

IBM DisplayWrite

application/x-displaywrite

Sun Raster image

image/x-cmu-raster

Lotus Ami Pro

application/x-lotus-amipro

Lotus Ami Pro Style Sheet

MORE Database MAC

Lyrix Word Processing

MASS-11

application/x-mass-11

MacPaint

image/x-macpaint

Microsoft Word for Macintosh (up to application/msword version 3)

SmartWare II

Microsoft Word for Windows (up to application/msword version 6)

MultiMate

application/x-multimate

MultiMate Footnote File

application/x-multimate-note

MultiMate Advantage

MultiMate Advantage Footnote File

IDOL KeyView (12.12)

Extension
EXE GIF

File Class adVECTORGRAPHIC adRASTERIMAGE, adVECTORGRAPHIC adEXECUTABLE adRASTERIMAGE adRASTERIMAGE

Readers kpepsrdr exesr gifsr, kpgifrdr

GIF

adRASTERIMAGE

gifsr, kpgifrdr

HW I4 IC IF, FFT
IMG IP RAS, RS, SUN SAM
M1, M11 MAC, PIC, PNTG DOC

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adOUTLINE adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR

stringssr
stringssr
dw4sr kpsunrdr lasr lasr
stringssr stringssr kpmacrdr mbsr

DOC, WPS

adCOMMUNICATION adWORDPROCESSOR misr

MM MMFN

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

stringssr stringssr stringssr stringssr

Page 106 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Multimate_Adv_II_Fmt Multimate_Adv_II_Fnote_Fmt
Multiplan_PC_Fmt Multiplan_Mac_Fmt MS_RTF_Fmt MS_Word_PC_Fmt
MS_Word_PC_StyleSheet_Fmt
MS_Word_PC_Glossary_Fmt
MS_Word_PC_Driver_Fmt
MS_Word_PC_Misc_Fmt
NBI_Async_Archive_Fmt Navy_DIF_Fmt
NBI_Net_Archive_Fmt NIOS_TOP_Fmt FileMaker_Mac_Fmt ODA_Q1_11_Fmt ODA_Q1_12_Fmt OLIDIF_Fmt Office_Writer_Fmt PC_Paintbrush_Fmt CPT_Comm_Fmt Lotus_PIC_Fmt Mac_PICT_Fmt
Philips_Script_Word_Fmt PostScript_Fmt

Number 49 50
51 52 53 54
55
56
57
58
59 60
61 62 63 64 65 66 67 68 69 70 71
72 73

Category 43 43
44 44 45 46
46
46
46
46
47 48
49 50 51 52 52 53 55 56 57 58 59
60 61

Description

MIME Type

MultiMate Advantage II

MultiMate Advantage II Footnote File

Microsoft Multiplan (PC)

application/x-ms-multiplan

Microsoft Multiplan (Mac)

application/x-ms-multiplan

Rich Text Format (RTF)

application/rtf

Microsoft Word for PC (up to version application/x-ms-wordpc 6)

Microsoft Word for PC (up to version 6) Style Sheet

Microsoft Word for PC (up to version 6) Glossary

Microsoft Word for PC (up to version 6) Driver

Microsoft Word for PC (up to version 6) Miscellaneous File

NBI Async Archive Format

Navy DIF (document interchange format)

application/x-navy

NBI OASys Net Archive Format

NIOS TOP

Filemaker MAC

ODA / ODIF Q1 11

ODA / ODIF Q1 12

OLIDIF (Olivetti)

Office Writer

PC Paintbrush Graphics (PCX)

image/vnd.zbrush.pcx

CPT Corporation word processor

Lotus PIC

image/x-pict

Macintosh Raster / QuickDraw Picture

image/x-pict

Philips Script

PostScript

application/postscript

IDOL KeyView (12.12)

Extension FBX, FNX
RTF MW
ND NN FP5, FP7 OD OD OW PCX PF PIC PCT
PS

File Class adWORDPROCESSOR adWORDPROCESSOR

Readers stringssr stringssr

adSPREADSHEET adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR

rtfsr mwsr

adWORDPROCESSOR mwsr

adWORDPROCESSOR mwsr

adWORDPROCESSOR mwsr

adWORDPROCESSOR mwsr

adWORDPROCESSOR adWORDPROCESSOR

stringssr

adWORDPROCESSOR adWORDPROCESSOR adDATABASE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adVECTORGRAPHIC adRASTERIMAGE, adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC

nnsr
stringssr stringssr
stringssr kppcxrdr stringssr kppicrdr kppctrdr

Page 107 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name PRIMEWORD_Fmt Quadratron_Q_One_v1_Fmt Quadratron_Q_One_v2_Fmt SAMNA_Word_IV_Fmt Ami_Pro_Draw_Fmt
SYLK_Spreadsheet_Fmt SmartWare_II_WP_Fmt
Symphony_Fmt Targa_Fmt TIFF_Fmt
Targon_Word_Fmt Uniplex_Ucalc_Fmt Uniplex_WP_Fmt MS_Word_UNIX_Fmt WANG_PC_Fmt WordERA_Fmt WANG_WPS_Comm_Fmt
WordPerfect_Mac_Fmt WordPerfect_Fmt WordPerfect_VAX_Fmt WordPerfect_Macro_Fmt WordPerfect_Dictionary_Fmt WordPerfect_Thesaurus_Fmt WordPerfect_Resource_Fmt WordPerfect_Driver_Fmt WordPerfect_Cfg_Fmt WordPerfect_Hyphenation_Fmt

Number 74 75 76 77 78

Category 62 63 64 65 66

Description PRIMEWORD Q-One V1.93J Q-One V2.0 SAMNA Word Lotus Ami Pro Draw

MIME Type

79

67

80

68

81

69

82

70

83

71

SYmbolic LinK (SYLK) format Informix SmartWare II word processor Lotus Symphony spreadsheet Truevision Targa image Tagged Image File Format (TIFF)

application/vnd.symphony image/x-tga image/tiff

84

72

85

73

86

74

87

75

88

76

89

77

90

78

91

79

92

86

93

139

94

139

95

139

96

139

97

139

98

139

99

139

100

139

Targon Word Uniplex Ucalc Uniplex word processor Microsoft Word UNIX Wang IWP for PC WordERA WANG WPS (Word Processing System) WordPerfect MAC WordPerfect version 4 WordPerfect VAX WordPerfect Macro WordPerfect Spelling Dictionary WordPerfect Thesaurus WordPerfect Resource File WordPerfect Driver WordPerfect Configuration File WordPerfect Hyphenation Dictionary

application/msword application/x-wang-iwp
application/x-corel-wordperfect application/x-corel-wordperfect application/x-corel-wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect

IDOL KeyView (12.12)

Extension
Q1, QX Q1, QX SAM SDW
SLK DOC, SMT
WR1 TGA TIF, TIFF
TW SS UP
DOC DC, GL, FR WF
WP, WP4
MRS SPW
WWK, PRS IRS, VRS PFX HYC

File Class adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC, adRASTERIMAGE adSPREADSHEET adWORDPROCESSOR

Readers pwsr stringssr stringssr stringssr kpsdwrdr
swsr

adSPREADSHEET adRASTERIMAGE adRASTERIMAGE, adFAXFORMAT adWORDPROCESSOR adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

kpTGArdr kptifrdr, tifsr stringssr stringssr
stringssr stringssr

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

wpmsr stringssr

Page 108 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name WordPerfect_Misc_Fmt WordMARC_Fmt Windows_Metafile_Fmt
Windows_Metafile_NoHdr_Fmt SmartWare_II_DB_Fmt WordPerfect_Graphics_Fmt
WordStar_Fmt WANG_WITA_Fmt Xerox_860_Comm_Fmt Xerox_Writer_Fmt DIF_SpreadSheet_Fmt Enable_Spreadsheet_Fmt SuperCalc_Fmt UltraCalc_Fmt SmartWare_II_SS_Fmt SOF_Encapsulation_Fmt PowerPoint_Win_Fmt
PowerPoint_Mac_Fmt
PowerPoint_95_Fmt PowerPoint_97_Fmt PageMaker_Mac_Fmt PageMaker_Win_Fmt MS_Works_Mac_WP_Fmt
MS_Works_Mac_DB_Fmt MS_Works_Mac_SS_Fmt
MS_Works_Mac_Comm_Fmt

Number 101 102 103

Category 139 82 83

Description WordPerfect Miscellaneous File WordMARC Composer Windows Metafile

MIME Type application/vnd.wordperfect video/x-ms-wm image/wmf

104

83

105

84

106

195

107

87

108

88

109

89

110

91

111

92

112

93

113

94

114

95

115

96

116

97

117

98

118

99

119

212

120

272

121

100

122

101

123

103

124

104

125

105

126

106

Windows Metafile (no header)

image/wmf

Informix SmartWare II database

database/x-smartdata

WordPerfect Graphics (version 2 and higher)

application/vnd.wordperfect

WordStar

application/vnd.wordstar

WANG WITA

Xerox 860

Xerox Writer

Data Interchange Format (DIF)

application/dif+xml

Enable Spreadsheet

application/vnd.epson.ssf

Sorcim SuperCalc spreadsheet

application/x-supercalc5

UltraCalc spreadsheet

Informix SmartWare II spreadsheet application/x-smartware

Serialized Object Format (SOF)

application/java-serialized-object

Microsoft PowerPoint PC (up to version 4)

application/x-ms-powerpoint

Microsoft PowerPoint MAC (up to version 4)

application/x-ms-powerpoint

Microsoft PowerPoint 95

application/x-ms-powerpoint

Microsoft PowerPoint 97

application/x-ms-powerpoint

PageMaker for Macintosh

PageMaker for Windows

Microsoft Works Word Processor for application/x-msworks MAC

Microsoft Works Database for MAC application/x-msworks

Microsoft Works Spreadsheet for MAC

application/x-msworks

Microsoft Works Communication for application/x-msworks MAC

IDOL KeyView (12.12)

Extension WM, PW WMF WMF WPG, QPG WS, WSD WT
DIF SSF CAL
SOF PPT PPT PPT PPT
MWK

File Class adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC, adRASTERIMAGE adVECTORGRAPHIC adDATABASE adRASTERIMAGE, adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adSPREADSHEET adSPREADSHEET adSPREADSHEET adSPREADSHEET adENCAPSULATION adPRESENTATION

Readers
stringssr kpwmfrdr kpwmfrdr
kpwg2rdr, kpwpgrdr stringssr stringssr stringssr stringssr difsr
kpp40rdr

adPRESENTATION

olesr

adPRESENTATION adPRESENTATION adDESKTOPPUBLSH adDESKTOPPUBLSH adWORDPROCESSOR

kpp95rdr kpp97rdr
stringssr

adDATABASE adSPREADSHEET

mwssr

adCOMMUNICATION

Page 109 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Works_DOS_WP_Fmt
MS_Works_DOS_DB_Fmt MS_Works_DOS_SS_Fmt
MS_Works_Win_WP_Fmt
MS_Works_Win_DB_Fmt
MS_Works_Win_SS_Fmt
PC_Library_Fmt MacWrite_Fmt MacWrite_II_Fmt Freehand_Fmt
Disk_Doubler_Fmt HP_GL_Fmt FrameMaker_Fmt FrameMaker_Book_Fmt Maker_Markup_Language_Fmt Maker_Interchange_Fmt
JPEG_File_Interchange_Fmt
Reflex_Fmt Framework_Fmt Framework_II_Fmt Paradox_Fmt MS_Windows_Write_Fmt Quattro_Pro_DOS_Fmt Quattro_Pro_Win_Fmt Persuasion_Fmt

Number 127
128 129
130
131
132
133 134 135 136

Category 107
108 109
227
231
228
111 112 113 114

Description

MIME Type

Microsoft Works Word Processor for application/x-msworks DOS

Microsoft Works Database for DOS application/x-msworks

Microsoft Works Spreadsheet for DOS

application/x-msworks

Microsoft Works Word Processor for application/x-msworks Windows (up to 2000)

Microsoft Works Database for Windows

application/x-msworks

Microsoft Works Spreadsheet for Windows

application/x-msworks

DOS/Windows Object Library

application/x-archive

MacWrite

application/macwriteii

MacWrite II

application/macwriteii

Adobe/Macrovision FreeHand image

image/x-freehand

137

115

138

116

139

136

140

136

141

174

142

117

143

118

Disk Doubler HP Graphics Language FrameMaker FrameMaker Book Maker Markup Language Adobe FrameMaker Interchange Format (MIF) JPEG File Interchange Format

vector/x-hpgl application/vnd.framemaker application/vnd.framemaker application/vnd.mif application/x-mif
image/jpeg

144

119

145

276

146

120

147

121

148

123

149

124

150

184

151

126

Borland Reflex database Framework office suite Framework II office suite Borland Paradox database Microsoft Windows Write Corel Quattro Pro for DOS Corel Quattro Pro for Windows Adobe Persuasion

database/reflex
application/paradox application/x-ms-write application/x-quattropro application/x-quattro-win

IDOL KeyView (12.12)

Extension WPS

File Class adWORDPROCESSOR

Readers stringssr

WDB

adDATABASE adSPREADSHEET

mwssr

WPS, W40

adWORDPROCESSOR msw6sr, mswsr

adDATABASE

WKS, S30, S40

adSPREADSHEET

mwssr

LIB, A
FH3, FH4, FH5, FH7, FH8, FH9, FH10, FH11
HPGL, HPG FM, FRM BOOK
MIF

adLIBRARY adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC

stringssr stringssr

adENCAPSULATION adVECTORGRAPHIC adDESKTOPPUBLSH adDESKTOPPUBLSH adDESKTOPPUBLSH adWORDPROCESSOR

mifsr

JPG, JPEG, JFIF, JFI
FW3 DB WRI WQ1 WB1, WB2, WB3

adRASTERIMAGE

jpgsr, kpjpgrdr

adDATABASE adMIXED adMIXED adDATABASE adWORDPROCESSOR adSPREADSHEET adSPREADSHEET adPRESENTATION

mwsr qpssr

Page 110 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Windows_Icon_Fmt Windows_Cursor_Fmt MS_Project_Activity_Fmt
MS_Project_Resource_Fmt
MS_Project_Calc_Fmt
PKZIP_Fmt
Quark_Xpress_Fmt ARC_PAK_Archive_Fmt MS_Publisher_Fmt PlanPerfect_Fmt WordPerfect_Auxiliary_Fmt
MS_WAVE_Audio_Fmt MIDI_Audio_Fmt AutoCAD_DXF_Binary_Fmt
AutoCAD_DXF_Text_Fmt
dBase_Fmt OS_2_PM_Metafile_Fmt Lasergraphics_Language_Fmt AutoShade_Rendering_Fmt GEM_VDI_Fmt Windows_Help_Fmt Volkswriter_Fmt Ability_WP_Fmt Ability_DB_Fmt Ability_SS_Fmt Ability_Comm_Fmt

Number 152 153 154
155
156
157

Category 128 133 129
129
129
132

Description
Windows Icon Format
Windows Cursor
Microsoft Project (up to version 3) activity file
Microsoft Project (up to version 3) resource file
Microsoft Project (up to version 3) calc file
ZIP Archive

MIME Type image/vnd.microsoft.icoN image/x-win-bitmap
application/zip

158

134

159

135

160

137

161

138

162

139

Quark Xpress MAC PAK/ARC Archive Microsoft Publisher (up to version 3) application/x-mspublisher PlanPerfect Corel WordPerfect auxiliary file

163

141

164

142

165

143

166

143

Microsoft Wave audio

audio/wav

MIDI audio

audio/mid

Autodesk AutoCAD DXF binary format

image/x-dxf

Autodesk AutoCAD DXF text format image/x-dxf

167

144

168

145

169

146

170

147

171

148

172

149

173

150

174

151

175

151

176

151

177

151

dBase Database III+/IV OS/2 PM Metafile Lasergraphics Language AutoShade Rendering GEM VDI Metafile image Windows Help File Volkswriter word processor Ability Word Processor Ability Database Ability Spreadsheet Ability Presentation

application/x-dbf application/x-autoshade application/winhlp

IDOL KeyView (12.12)

Extension ICO CUR
ZIP, ZIPX
ARC, PAK PUB WPW WAV MID, MIDI DXF DXF DBF, VCX MET
GEM, GDI HLP VW4

File Class adRASTERIMAGE adRASTERIMAGE adSCHEDULE

Readers kpicordr

adSCHEDULE

adSCHEDULE

adENCAPSULATION, adEXECUTABLE adDESKTOPPUBLSH adENCAPSULATION adDESKTOPPUBLSH adSCHEDULE adMISC, adENCAPSULATION adSOUND adSOUND adVECTORGRAPHIC
adVECTORGRAPHIC
adDATABASE adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adMISC adWORDPROCESSOR adWORDPROCESSOR adDATABASE adSPREADSHEET adCOMMUNICATION

unzip
mspubsr
MCI, riffsr MCI kpDXFrdr, kpODArdr kpDXFrdr, kpODArdr dbfsr
stringssr

Page 111 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Ability_Image_Fmt XyWrite_Fmt CSV_Fmt IBM_Writing_Assistant_Fmt WordStar_2000_Fmt HP_PCL_Fmt
UNIX_Exe_PreSysV_VAX_Fmt
UNIX_Exe_Basic_16_Fmt UNIX_Exe_x86_Fmt UNIX_Exe_iAPX_286_Fmt UNIX_Exe_MC68k_Fmt UNIX_Exe_3B20_Fmt UNIX_Exe_WE32000_Fmt UNIX_Exe_VAX_Fmt UNIX_Exe_Bell_5_Fmt UNIX_Obj_VAX_Demand_Fmt UNIX_Obj_MS8086_Fmt UNIX_Obj_Z8000_Fmt AU_Audio_Fmt NeWS_Font_Fmt cpio_Archive_CRChdr_Fmt cpio_Archive_CHRhdr_Fmt PEX_Binary_Archive_Fmt Sun_vfont_Fmt Curses_Screen_Fmt UUEncoded_Fmt WriteNow_Fmt PC_Obj_Fmt Windows_Group_Fmt

Number 178 179 180 181 182 183
184
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206

Category 151 152 153 154 155 157
158
158 158 158 158 158 158 158 158 159 159 159 161 162 163 163 164 165 166 167 168 169 170

Description

MIME Type

Ability Image

XYWrite / Nota Bene

CSV (Comma Separated Values) text/csv

IBM Writing Assistant

WordStar 2000

HP Printer Command Language (PCL)

application/pcl

UNIX executable (PDP-11/preSystem V VAX)

application/octet-stream

UNIX executable (Basic-16)

application/octet-stream

UNIX executable (x86)

application/octet-stream

UNIX executable (iAPX 286)

application/octet-stream

UNIX executable (MC680x0)

application/octet-stream

UNIX executable (3B20)

application/octet-stream

UNIX executable (WE32000)

application/octet-stream

UNIX executable (VAX)

application/octet-stream

UNIX executable (Bell 5.0)

application/octet-stream

UNIX object module (VAX Demand)

UNIX object module (old MS 8086)

UNIX object module (Z8000)

NeXT/Sun Audio Data

audio/basic

NeWS bitmap font

cpio archive (CRC Header)

application/x-cpio

cpio archive (CHR Header)

application/x-cpio

SUN PEX Binary Archive

SUN vfont Definition

Curses Screen Image

UU-encoded text

text/x-uuencode

WriteNow MAC

DOS/Windows Object Module

application/octet-stream

Windows Group

IDOL KeyView (12.12)

Extension XY4 CSV IWA WS2 PCL, PRN
AU, SND CPIO CPIO
UUE OBJ, EXP GRP

File Class adRASTERIMAGE adWORDPROCESSOR adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC

Readers
xywsr csvsr stringssr stringssr

adEXECUTABLE

adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adOBJECTMODULE adOBJECTMODULE adOBJECTMODULE adSOUND adFONT adENCAPSULATION adENCAPSULATION adENCAPSULATION adFONT adRASTERIMAGE adENCAPSULATION adWORDPROCESSOR adOBJECTMODULE adMISC

MCI
uudsr stringssr

Page 112 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name TrueType_Font_Fmt Windows_PIF_Fmt MS_COM_Executable_Fmt StuffIt_Fmt PeachCalc_Fmt Wang_GDL_Fmt Q_A_DOS_Fmt Q_A_Win_Fmt WPS_PLUS_Fmt DCX_Fmt OLE_Fmt EBCDIC_Fmt DCS_Fmt UNIX_SHAR_Fmt Lotus_Notes_BitMap_Fmt Lotus_Notes_CDF_Fmt Compress_Fmt GZ_Compress_Fmt TAR_Fmt ODIF_FOD26_Fmt
ODIF_FOD36_Fmt
ALIS_Fmt Envoy_Fmt PDF_Fmt
BinHex_Fmt SMTP_Fmt MIME_Fmt USENET_Fmt

Number 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
227
228 229 230

Category 171 172 173 175 176 177 179 180 181 182 183 186 187 190 191 193 192 198 194 196
196
197 199 200

Description

MIME Type

TrueType Font

application/x-font-ttf

Program Information File (PIF)

application/octet-stream

PC (.COM)

application/octet-stream

StuffIt (MAC)

application/x-stuffit

PeachCalc

WANG Office GDL Header

Symantec Q&A for DOS

application/x-qa-write

Symantec Q&A for Windows

application/x-qa-write

WPS-PLUS

application/vnd.ms-wpl

DCX FAX Format(PCX images)

image/dcx

OLE Compound Document

EBCDIC Text

application/ebcdic

DCS

SHAR shell archive format

application/x-shar

Lotus Notes Bitmap

Lotus Notes CDF

application/cdf

UNIX Compress archive

application/x-compress

GZ Compress archive

application/gzip

TAR (tape archive)

application/tar

Open Document Architecture (ODA / application/oda ODIF) FOD26

Open Document Architecture (ODA / application/oda ODIF) FOD36

ALIS

WordPerfect Envoy

application/envoy

Adobe PDF (Portable Document Format)

application/pdf

231

206

232

207

233

208

234

264

BinHex

application/mac-binhex40

SMTP (Text Mail / Outlook Express) message/rfc822

MIME (EML / MBX email)1

message/rfc822

USENET

message/news

IDOL KeyView (12.12)

Extension TTF PIF COM HQX CAL
JW WPL DCX OLE
SHAR
CDF Z GZ TAR F26 F36
EVY PDF
HQX SMTP EML, MBX

File Class adFONT adMISC adEXECUTABLE adENCAPSULATION adSPREADSHEET adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adFAXFORMAT adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION adRASTERIMAGE adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adENCAPSULATION adWORDPROCESSOR

Readers
stringssr stringssr stringssr kpdcxrdr olesr
stringssr kvzee, kvzeesr kvgz, kvgzsr tarsr

adWORDPROCESSOR

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR
adENCAPSULATION adENCAPSULATION adENCAPSULATION adWORDPROCESSOR

kppdf2rdr, kppdfrdr, pdf2sr, pdfsr
kvhqxsr
emlsr
mbxsr

Page 113 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name SGML_Fmt HTML_Fmt ACT_Fmt PNG_Fmt MS_Video_Fmt Windows_Animated_Cursor_Fmt Windows_CPP_Obj_Storage_Fmt Windows_Palette_Fmt RIFF_DIB_Fmt RIFF_MIDI_Fmt RIFF_Multimedia_Movie_Fmt MPEG_Fmt QuickTime_Fmt AIFF_Fmt
Amiga_MOD_Fmt Amiga_IFF_8SVX_Fmt Creative_Voice_Audio_Fmt AutoDesk_Animator_FLI_Fmt AutoDesk_AnimatorPro_FLC_Fmt Compactor_Archive_Fmt VRML_Fmt QuickDraw_3D_Metafile_Fmt PGP_Secret_Keyring_Fmt PGP_Public_Keyring_Fmt PGP_Encrypted_Data_Fmt PGP_Signed_Data_Fmt PGP_SignedEncrypted_Data_Fmt PGP_Sign_Certificate_Fmt PGP_Compressed_Data_Fmt PGP_ASCII_Public_Keyring_Fmt

Number 235 236 237 238 239 240 241 242 243 244 245 246 247 248
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264

Category 209 210 211 213 214 215 216 217 218 219 220 221 222 223
224 225 226 229 230 233 234 235 236 237 238 239 240 241 246 242

Description SGML HTML ACT! CRM software Portable Network Graphics (PNG) Video for Windows (AVI) Windows Animated Cursor Windows C++ Object Storage Windows Palette RIFF Device Independent Bitmap RIFF MIDI RIFF Multimedia Movie MPEG Movie QuickTime Movie, MPEG-4 audio Audio Interchange File Format (AIFF) Amiga MOD Amiga IFF (8SVX) Sound Creative Voice (VOC) AutoDesk Animator FLIC AutoDesk Animator Pro FLIC Compactor / Compact Pro VRML QuickDraw 3D Metafile PGP secret key PGP public key PGP encrypted data PGP signed data PGP signed and encrypted data PGP signature certificate PGP compressed data ASCII-armored PGP public key

MIME Type text/sgml text/html
image/png video/avi
audio/midi
video/mpeg video/quicktime audio/aiff
audio/x-8svx
video/x-fli video/x-flc application/mac-compactpro model/vrml
application/pgp application/pgp application/pgp application/pgp application/pgp application/pgp-signature application/pgp application/pgp

IDOL KeyView (12.12)

Extension SGML HTM, HTML ACT PNG AVI ANI
PAL
RMI MMM
MOV, QT, MP4 AIF, AIFF, AIFC MOD IFF VOC FLI FLC
WRL
SIG
PGP

File Class adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adMOVIE adRASTERIMAGE adMIXED adRASTERIMAGE adRASTERIMAGE adSOUND adMOVIE adMOVIE adMOVIE adSOUND

Readers afsr htmsr kppngrdr, pngsr MCI kpanirdr
MCI, mpeg4sr MCI, aiffsr

adSOUND adSOUND adSOUND adANIMATION adANIMATION adENCAPSULATION adVECTORGRAPHIC adVECTORGRAPHIC adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION

Page 114 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name PGP_ASCII_Encoded_Fmt
PGP_ASCII_Signed_Fmt OLE_DIB_Fmt SGI_Image_Fmt Lotus_ScreenCam_Fmt MPEG_Audio_Fmt FTP_Software_Session_Fmt Netscape_Bookmark_File_Fmt Corel_Draw_CMX_Fmt AutoDesk_DWG_Fmt
AutoDesk_WHIP_Fmt Macromedia_Director_Fmt
Real_Audio_Fmt MSDOS_Device_Driver_Fmt Micrografx_Designer_Fmt SVF_Fmt Applix_Words_Fmt Applix_Graphics_Fmt MS_Access_Fmt MS_Access_95_Fmt MS_Access_97_Fmt MacBinary_Fmt Apple_Single_Fmt Apple_Double_Fmt Enhanced_Metafile_Fmt MS_Office_Drawing_Fmt XML_Fmt DeVice_Independent_Fmt

Number 265
266 267 268 269 270 271 272 273 274

Category 243
244 245 247 248 249 250 210 252 253

Description

MIME Type

ASCII-armored PGP-encoded message

application/pgp

ASCII-armored PGP signed

application/pgp

OLE DIB object

SGI RGB Image

image/sgi

Lotus ScreenCam

application/vnd.lotus-screencam

MPEG-1 Audio layer3 (MP3)

audio/mpeg

FTP Session Data

Netscape Bookmark File

text/html

Corel CMX

application/cmx

AutoDesk AutoCAD Drawing (DWG) image/x-dwg

275

254

276

255

277

256

278

257

279

258

280

259

281

261

282

262

283

263

284

263

285

263

286

265

287

266

288

267

289

270

290

271

291

285

292

274

AutoDesk WHIP

Macromedia Shockwave/Adobe Director

application/x-director

Real Audio

audio/x-pn-realaudio

MSDOS Device Driver

application/octet-stream

Micrografx Designer

Simple Vector Format (SVF)

image/x-svf

Applix Words

application/x-applix-word

Applix Graphics

Microsoft Access (versions 1 and 2) application/x-msaccess

Microsoft Access 95

application/msaccess

Microsoft Access 97

application/msaccess

MacBinary

application/x-macbinary

Apple Single

Apple Double

multipart/appledouble

Enhanced Metafile

image/x-emf

Microsoft Office Drawing

XML

text/xml

DeVice Independent file (DVI)

application/x-dvi

IDOL KeyView (12.12)

Extension

File Class adENCAPSULATION

Readers

adENCAPSULATION

adRASTERIMAGE

RGB

adRASTERIMAGE

SCM

adANIMATION

MPEGA, MPG, MP3 adSOUND

STE

adCOMMUNICATION

adWORDPROCESSOR

CMX

adVECTORGRAPHIC

DWG

adVECTORGRAPHIC

WHP DCR, DXR, DIR

adVECTORGRAPHIC adANIMATION

kpsgirdr
MCI, mp3sr
htmsr
kpDWGrdr, kpODArdr

RM, RA SYS DSF SVF AW AG MDB MDB MDB BIN
AD EMF
XML DVI

adSOUND adEXECUTABLE adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adPRESENTATION adDATABASE adDATABASE adDATABASE adENCAPSULATION adENCAPSULATION adENCAPSULATION adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC

awsr kpagrdr mdbsr mdbsr mdbsr macbinsr
kpemfrdr kpmsordr xmlsr

Page 115 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Unicode_Fmt Lotus_123_Worksheet_Fmt
Lotus_123_Format_Fmt Lotus_123_97_Fmt Lotus_Word_Pro_96_Fmt Lotus_Word_Pro_97_Fmt Freelance_DOS_Fmt Freelance_Win_Fmt Freelance_OS2_Fmt Freelance_96_Fmt Freelance_97_Fmt MS_Word_95_Fmt MS_Word_97_Fmt Excel_Fmt Excel_Chart_Fmt
Excel_Macro_Fmt
Excel_95_Fmt Excel_97_Fmt Corel_Presentations_Fmt Harvard_Graphics_Fmt Harvard_Graphics_Chart_Fmt Harvard_Graphics_Symbol_Fmt Harvard_Graphics_Cfg_Fmt Harvard_Graphics_Palette_Fmt Lotus_123_R9_Fmt Applix_Spreadsheets_Fmt MS_Pocket_Word_Fmt
MS_DIB_Fmt

Number 293 294

Category 275 81

Description Unicode text file Lotus 1-2-3

MIME Type text/plain application/x-lotus-123

295

81

296

81

297

268

298

268

299

140

300

140

301

140

302

140

303

140

304

189

305

269

306

90

307

90

308

90

309

188

310

188

311

127

312

131

313

131

314

131

315

131

316

131

317

81

318

278

319

45

320

279

Lotus 1-2-3 Formatting

application/x-123

Lotus 1-2-3 97

application/x-lotus-123

Lotus Word Pro 96

application/vnd.lotus-wordpro

Lotus Word Pro 97

application/vnd.lotus-wordpro

Lotus Freelance for DOS

application/x-freelance

Lotus Freelance for Windows

application/x-freelance

Lotus Freelance for OS/2

application/x-freelance

Lotus Freelance 96

application/x-freelance

Lotus Freelance 97

application/x-freelance

Microsoft Word 95

application/msword

Microsoft Word 97

application/msword

Microsoft Excel (up to version 5)

application/x-ms-excel

Microsoft Excel (up to version 5) chart

application/x-ms-excel

Microsoft Excel (up to version 5) macro

application/vnd.ms-excel

Microsoft Excel 95

application/x-ms-excel

Microsoft Excel 97

application/x-ms-excel

Corel Presentations

application/x-corelpresentations

Harvard Graphics

Harvard Graphics Chart

Harvard Graphics Symbol File (v3)

Harvard Graphics Configuration File

Harvard Graphics Palette

Lotus 1-2-3 Release 9

application/x-lotus-123

Applix Spreadsheets

application/x-applix-spreadsheet

Microsoft Pocket Word for Handheld PC

Microsoft Device Independent

image/bmp

IDOL KeyView (12.12)

Extension UNI WKS, WK1, WK3, WK4 FM3 123 LWP, MWP LWP, MWP PRZ PRE, FLW PRS PRZ PRZ DOC DOC, WPS, WBK XLS XLC

File Class adWORDPROCESSOR adSPREADSHEET

Readers unisr wkssr

adSPREADSHEET adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adSPREADSHEET

l123sr l123sr lwpsr lwpsr kpprzrdr kpprerdr kpprerdr kpprzrdr kpprzrdr mw6sr mw8sr xlssr xlssr

XLM

adSPREADSHEET

xlssr

XLS XLS, XLR SHW, PRC PR4 CH3, CHT SY3
PL 123 AS PWD

adSPREADSHEET adSPREADSHEET adPRESENTATION adPRESENTATION adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adSPREADSHEET adSPREADSHEET adWORDPROCESSOR

xlssr xlssr kpshwrdr
l123sr assr rtfsr

DIB

adRASTERIMAGE

Page 116 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
MS_Word_2000_Fmt Excel_2000_Fmt PowerPoint_2000_Fmt MS_Access_2000_Fmt MS_Project_4_Fmt MS_Project_41_Fmt MS_Project_98_Fmt Folio_Flat_Fmt HWP_Fmt
ICHITARO_Fmt IS_XML_Fmt Oasys_Fmt PBM_ASC_Fmt
PBM_BIN_Fmt
PGM_ASC_Fmt
PGM_BIN_Fmt
PPM_ASC_Fmt
PPM_BIN_Fmt
XBM_Fmt XPM_Fmt FPX_Fmt PCD_Fmt MS_Visio_Fmt
MS_Project_2000_Fmt

Number
321 322 323 324 325 326 327 328 329
330 331 332 333
334
335
336
337
338
339 340 341 342 343

Category
269 188 272 263 281 281 281 282 283
284 273 286 287
287
288
288
289
289
290 291 292 293 294

Description

MIME Type

Bitmap

Microsoft Word 2000

application/msword

Microsoft Excel 2000

application/x-ms-excel

Microsoft PowerPoint 2000

application/x-ms-powerpoint

Microsoft Access 2000

application/x-msaccess

Microsoft Project 4

Microsoft Project 4.1

Microsoft Project 98

application/vnd.ms-project

Folio Flat File

Haansoft Hangul HWP (Arae-Ah Hangul)

application/x-hwp

ICHITARO (v4-10)

application/x-ichitaro

Extended or Custom XML

text/xml

Fujitsu OASYS

application/vnd.fujitsu.oasys

Portable Bitmap Utilities ASCII format (PBM)

image/pbm

Portable Bitmap Utilities BINARY format (PBM)

image/pbm

Portable Greymap Utilities ASCII format (PGM)

image/x-pgm

Portable Greymap Utilities BINARY image/x-pgm format (PGM)

Portable Pixmap Utilities ASCII format (PPM)

image/x-portable-pixmap

Portable Pixmap Utilities BINARY format (PPM)

image/x-portable-pixmap

X-Window X Bitmap format (XBM) image/x-xbitmap

X-Window X Pixmap format (XPM) image/xpm

Kodak FlashPix FPX Image format image/fpx

PCD Image format

image/pcd

Microsoft Visio (up to version 11)

image/x-vsd

344

281

Microsoft Project 2000

application/vnd.ms-project

IDOL KeyView (12.12)

Extension

File Class

Readers

DOC XLS PPT MDB MPP MPP MPP FFF HWP
JTD XML OAS, OA2, OA3 PBM, PNM
PBM, PNM
PGM, PNM
PGM, PNM
PPM, PNM
PPM, PNM
XBM XPM FPX PCD VSD
MPP

adWORDPROCESSOR adSPREADSHEET adPRESENTATION adDATABASE adSCHEDULE adSCHEDULE adSCHEDULE adWORDPROCESSOR adWORDPROCESSOR

mw8sr xlssr kpp97rdr mdbsr mppsr mppsr mppsr foliosr hwposr, hwpsr

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE

jtdsr oa2sr

adRASTERIMAGE

adRASTERIMAGE

adRASTERIMAGE

adRASTERIMAGE

adRASTERIMAGE

adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adPRESENTATION
adSCHEDULE

olesr
kpVSD2rdr, vsdsr mppsr

Page 117 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Outlook_Fmt ELF_Relocatable_Fmt ELF_Executable_Fmt ELF_Dynamic_Lib_Fmt MS_Word_XML_Fmt MS_Excel_XML_Fmt MS_Visio_XML_Fmt SO_Text_XML_Fmt
SO_Spreadsheet_XML_Fmt
SO_Presentation_XML_Fmt
XHTML_Fmt MS_OutlookPST_Fmt
RAR_Fmt
Lotus_Notes_NSF_Fmt
Macromedia_Flash_Fmt MS_Word_2007_Fmt MS_Excel_2007_Fmt MS_PPT_2007_Fmt OpenPGP_Fmt
Intergraph_V7_DGN_Fmt
MicroStation_V8_DGN_Fmt MS_Word_Macro_2007_Fmt MS_Excel_Macro_2007_Fmt MS_PPT_Macro_2007_Fmt
LZH_Fmt

Number 345 346 347 348 349 350 351 352
353
354
355 356
357

Category 295 159 158 160 285 285 285 314
315
316
296 297
298

Description

MIME Type

Microsoft Outlook message

application/vnd.ms-outlook

ELF Relocatable

application/octet-stream

ELF Executable

application/octet-stream

ELF Dynamic Library

application/octet-stream

Microsoft Word 2003 XML

text/xml

Microsoft Excel 2003 XML

text/xml

Microsoft Visio 2003 XML

text/xml

OpenDocument format (OpenOffice application/vnd.sun.xml.writer 1/StarOffice 6,7) Text XML

OpenDocument format (OpenOffice application/vnd.sun.xml.calc 1/StarOffice 6,7) Spreadsheet XML

OpenDocument format (OpenOffice application/vnd.sun.xml.impress 1/StarOffice 6,7) Presentation XML

XHTML

text/xhtml

Microsoft Outlook Personal Folders application/vnd.ms-outlook-pst File (.pst)

RAR archive format

application/x-rar-compressed

358

299

359

300

360

301

361

302

362

303

363

304

364

305

365

306

366

307

367

308

368

309

IBM Lotus Notes Database NSF/NTF Macromedia Flash (.swf) Microsoft Word 2007 XML - Docx Microsoft Excel 2007 XML Microsoft PowerPoint 2007 XML OpenPGP/GPG Message Format (with new packet format) Intergraph Standard File Format (ISFF) V7 DGN (non-OLE) MicroStation V8 DGN (OLE) Microsoft Word Macro 2007 XML Microsoft Excel Macro 2007 XML Microsoft PPT Macro 2007 XML

application/x-lotus-notes application/x-shockwave-flash application/x-ms-word07 application/x-ms-excel07 application/x-ms-powerpoint07 application/pgp-encrypted
application/x-ms-word07m application/x-ms-excel07m application/x-ms-powerpoint07m

369

310

LZH Archive

application/x-lzh-compressed

IDOL KeyView (12.12)

Extension MSG, OFT O
SO XML XML VDX SXW

File Class adENCAPSULATION adOBJECTMODULE adEXECUTABLE adLIBRARY adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

Readers msgsr
xmlsr xmlsr xmlsr odfwpsr

SXC, STC

adSPREADSHEET

sosr

SXD, SXI

adPRESENTATION

kpodfrdr

XML, XHTML, XHT PST
RAR, REV, R00, R01 NSF

adWORDPROCESSOR adENCAPSULATION
adENCAPSULATION, adEXECUTABLE adENCAPSULATION

pstnsr, pstsr, pstxsr rarsr
nsfsr

SWF, SWD

adWORDPROCESSOR

DOCX, DOTX

adWORDPROCESSOR

XLSX, XLTX

adSPREADSHEET

PPTX, POTX, PPSX adPRESENTATION

GPG, PGP

adENCAPSULATION

swfsr mwxsr xlsxsr kpppxrdr

DGN

adVECTORGRAPHIC

DGN

adVECTORGRAPHIC

DOCM, DOTM

adWORDPROCESSOR

XLSM, XLTM, XLAM adSPREADSHEET

PPTM, POTM, PPSM, PPAM

adPRESENTATION

LZH, LHA

adENCAPSULATION

olesr mwxsr xlsxsr kpppxrdr
lzhsr

Page 118 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Office_2007_Fmt
MS_XPS_Fmt
Lotus_Domino_DXL_Fmt
ODF_Text_Fmt ODF_Spreadsheet_Fmt ODF_Presentation_Fmt Legato_Extender_ONM_Fmt
bin_Unknown_Fmt TNEF_Fmt
CADAM_Drawing_Fmt CADAM_Drawing_Overlay_Fmt NURSTOR_Drawing_Fmt HP_GLP_Fmt ASF_Fmt WMA_Fmt
WMV_Fmt
EMX_Fmt
Z7Z_Fmt
MS_Excel_Binary_2007_Fmt CAB_Fmt CATIA_Fmt
YIM_Fmt ODF_Drawing_Fmt Founder_CEB_Fmt

Number 370
371
372
373 374 375 376
377 378
379 380 381 382 383 384
385
386
387

Category 311
312
313
314 315 316 317
318 319
320 321 322 323 324 325
326
327
328

Description
Office 2007 document that cannot be further classified (often RMSencrypted)
Microsoft Open XML Paper Specification (XPS/OXPS)
IBM Domino Data in XML format (.dxl)
ODF Text
ODF Spreadsheet
ODF Presentation
Legato Extender Native Message ONM
Bin unknown format (.xxx)
Transport Neutral Encapsulation Format (TNEF)
CADAM Drawing
CADAM Drawing Overlay
NURSTOR Drawing
HP Graphics Language (Plotter)
Advanced Systems Format (ASF)
Windows Media Audio Format (WMA)
Windows Media Video Format (WMV)
Legato EMailXtender Archives Format (EMX)
7-Zip archive (7z)

MIME Type
application/vnd.ms-xpsdocument application/x-dxlfile application/vnd.oasis.opendocument.text application/vnd.oasis.opendocument.spreadsheet application/vnd.oasis.opendocument.presentation application/x-lotus-notes
application/vnd.ms-tnef
vector/x-hpgl2 application/x-ms-asf audio/x-ms-wma video/x-ms-wmv
application/7z

Extension DOCX, XLSX, PPTX, XLSB XPS, OXPS DXL ODT ODS ODP ONM
CDD CDO NUR HPG ASF WMA WMV EMX 7Z

388

329

389

330

390

331

391

332

392

316

393

333

Microsoft Excel Binary 2007 Microsoft Cabinet File (CAB) CATIA Formats (CAT*)
Yahoo! Instant Messenger History ODF Drawing/Graphics Founder Chinese E-paper Basic

application/vnd.ms-excel.sheet.binary.macroenabled.12 XLSB

application/vnd.ms-cab-compressed

CAB

CATPART, CATPRODUCT2

DAT

application/vnd.oasis.opendocument.graphics

ODG

application/ceb

CEB

File Class adMISC

Readers

adWORDPROCESSOR xpssr

adENCAPSULATION

dxlsr

adWORDPROCESSOR adSPREADSHEET adPRESENTATION adENCAPSULATION

odfwpsr odfsssr kpodfrdr onmsr

adWORDPROCESSOR adENCAPSULATION

tnefsr

adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adMISC adSOUND

asfsr asfsr

adMOVIE

asfsr

adENCAPSULATION

emxsr

adENCAPSULATION, adEXECUTABLE adSPREADSHEET adENCAPSULATION adVECTORGRAPHIC

z7zsr
xlsbsr cabsr kpCATrdr

adWORDPROCESSOR adVECTORGRAPHIC adWORDPROCESSOR

yimsr kpodfrdr cebsr

IDOL KeyView (12.12)

Page 119 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
QPW_Fmt MHT_Fmt MDI_Fmt GRV_Fmt IWWP_Fmt IWSS_Fmt IWPG_Fmt BKF_Fmt MS_Access_2007_Fmt ENT_Fmt
DMG_Fmt CWK_Fmt OO3_Fmt OPML_Fmt Omni_Graffle_XML_Fmt PSD_Fmt Apple_Binary_PList_Fmt Apple_iChat_Fmt OOUTLINE_Fmt BZIP2_Fmt ISO_Fmt DocuWorks_Fmt RealMedia_Fmt AC3Audio_Fmt NEF_Fmt SolidWorks_Fmt
XFDL_Fmt

Number
394 395 396 397 398 399 400 401 402 403
404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419

Category
334 335 336 337 338 339 340 341 342 343
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359

Description

MIME Type

(ceb)

Corel Quattro Pro 9+ for Windows application/quattro-pro MIME HTML MHTML format (MHT)1 multipart/related

Microsoft Document Imaging Format image/vnd.ms-modi

Microsoft Office Groove Format

application/vnd.groove-injector

Apple iWork Pages format

application/vnd.apple.pages

Apple iWork Numbers format

application/vnd.apple.numbers

Apple iWork Keynote format

application/vnd.apple.keynote

Microsoft Windows Backup File

Microsoft Access 2007

application/msaccess

Microsoft Entourage Database Format

Mac Disk Copy Disk Image File

application/x-apple-diskimage

AppleWorks (Claris Works) File

application/appleworks

Omni Outliner V3 File

Omni Outliner OPML File

Omni Graffle XML File

Adobe Photoshop Document

image/vnd.adobe.photoshop

Apple Binary Property List format application/x-bplist

Apple iChat format

OOutliner File

Bzip 2 Compressed File

application/x-bzip2

ISO-9660 CD Disc Image Format application/x-iso9660-image

DocuWorks Format

application/vnd.fujixerox.docuworks

RealMedia Streaming Media

application/vnd.rn-realmedia

AC3 Audio File Format

audio/ac3

Nero Encrypted File

SolidWorks Format Files

420

366

Extensible Forms Description Language

application/x-xfdl

IDOL KeyView (12.12)

Extension

File Class

Readers

QPW MHT, MHTML MDI GRV PAGES NUMBERS KEY BKF ACCDB

adSPREADSHEET adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adENCAPSULATION adDATABASE adENCAPSULATION

qpwsr mhtsr
iwwpsr iwsssr kpIWPGrdr bkfsr mdbsr entsr

DMG, ISO, IMAGE adENCAPSULATION

CWK

adWORDPROCESSOR

OO3

adWORDPROCESSOR

OPML

adWORDPROCESSOR

GRAFFLE

adVECTORGRAPHIC

PSD, PSB

adRASTERIMAGE

PLIST

adMISC

ICHAT

adWORDPROCESSOR

OOUTLINE

adWORDPROCESSOR

BZ2

adENCAPSULATION

ISO

adENCAPSULATION

XDW

adWORDPROCESSOR

RM, RA

adMOVIE

AC3

adSOUND

NEF

adENCAPSULATION

SLDASM, SLDPRT, adVECTORGRAPHIC SLDDRW, SLDDRT

XFDL, XFD

adPRESENTATION

dmgsr stringssr oo3sr oo3sr kpGFLrdr psdsr
ichatsr oo3sr bzip2sr isosr
olesr kpXFDLrdr

Page 120 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Apple_XML_PList_Fmt OneNote_Fmt IFilter_Fmt Dicom_Fmt
EnCase_Fmt
Scrap_Fmt MS_Project_2007_Fmt MS_Publisher_98_Fmt Skype_Fmt Hl7_Fmt MS_OutlookOST_Fmt
Epub_Fmt
MS_OEDBX_Fmt
BB_Activ_Fmt DiskImage_Fmt Milestone_Fmt

Number 421 422 423 424
425
426 427 428 429 430 431
432
433
434 435 436

Category 367 368 369 370
371
372 373 374 375 377 378
379
380
381 382 383

Description

MIME Type

Apple XML Property List format

application/x-plist

Microsoft OneNote Note Format

application/onenote

iFilter

Digital Imaging and Communications in Medicine (Dicom)

application/dicom

Expert Witness Compression Format (EnCase)

Shell Scrap Object File

Microsoft Project 2007

application/vnd.ms-project

Microsoft Publisher from version 98 application/x-mspublisher

Skype Log File

Health level7 message

Microsoft Outlook Offline Folders File (OST)

application/vnd.ms-outlook-pst

Open Publication Structure electronic publication

application/epub+zip

Microsoft Outlook Express DBX Message Database

BlackBerry Activation File

Disk Image

Milestone Document

E_Transcript_Fmt PostScript_Font_Fmt Ghost_DiskImage_Fmt JPEG_2000_JP2_File_Fmt
Unicode_HTML_Fmt CHM_Fmt EMCMF_Fmt MS_Access_2007_Tmpl_Fmt Jungum_Fmt

437

384

438

385

439

386

440

387

441

388

442

389

443

390

444

391

445

392

RealLegal E-Transcript File

PostScript Type 1 Font

application/x-font

Ghost Disk Image File

JPEG-2000 JP2 File Format Syntax image/jp2 (ISO/IEC 15444-1)

Unicode HTML

text/html

Microsoft Compiled HTML Help

application/x-chm

Documentum EMCMF format

Microsoft Access 2007 Template

Samsung Electronics Jungum

application/jungum

IDOL KeyView (12.12)

Extension PLIST ONE
DCM

File Class adMISC adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE

Readers onesr dcmsr

E01, L01, LX01
SHS MPP PUB DBB HL7 OST

adENCAPSULATION
adENCAPSULATION adSCHEDULE adDESKTOPPUBLSH adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION

encase2sr, encasesr olesr mppsr mspubsr skypesr hl7sr pffsr

EPUB

adWORDPROCESSOR epubsr

DBX

adENCAPSULATION

dbxsr

DAT DMG MLS, ML3, ML4, ML5, ML6, ML7, ML8, ML9, MLA PTX PFB GHO, GHS JP2, JPF, J2K, JPWL, JPX, PGX HTM, HTML CHM EMCMF ACCDT GUL

adWORDPROCESSOR adENCAPSULATION adRASTERIMAGE

adWORDPROCESSOR adFONT adENCAPSULATION adRASTERIMAGE
adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adDATABASE adWORDPROCESSOR

pfasr
jp2000sr, kpjp2000rdr unihtmsr chmsr msgsr

Page 121 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
JBIG2_Fmt EFax_Fmt AD1_Fmt SketchUp_Fmt GWFS_Email_Fmt JNT_Fmt Yahoo_yChat_Fmt PaperPort_MAX_File_Fmt ARJ_Fmt
RPMSG_Fmt
MAT_Fmt SGY_Fmt CDXA_MPEG_PS_Fmt
EVT_Fmt EVTX_Fmt MS_OutlookOLM_Fmt
WARC_Fmt JAVACLASS_Fmt VCF_Fmt EDB_Fmt
ICS_Fmt
MS_Visio_2013_Fmt

Number
446 447 448 449 450 451 452 453 454
455
456 457 458
459 460 461
462 463 464 465
466
467

Category
393 394 395 396 397 398 399 400 402
403
404 405 406
407 408 409
410 411 412 413
414
415

Description

MIME Type

Global document

JBIG2 File Format

image/jbig2

eFax file

AD1 Evidence file

Google SketchUp

GroupWise FileSurf email

Windows Journal format

Yahoo! Messenger chat log

PaperPort MAX image file

image/max

ARJ (Archive by Robert Jung) file format

application/arj

Microsoft Outlook Restricted Permission Message

application/x-microsoft-rpmsg-message

MATLAB file format

application/x-matlab-data

SEG-Y Seismic Data format

MPEG-PS container with CDXA stream

video/mpeg

Microsoft Windows NT Event Log

Microsoft Windows Vista Event Log

Microsoft Outlook for Macintosh format

Web ARChive

application/warc

Java Class format

application/x-java-class

Microsoft Outlook vCard file format text/vcard

Microsoft Exchange Server Database file format

Microsoft Outlook iCalendar file format

text/calendar

Microsoft Visio 2013

application/vnd.visio

MS_Visio_2013_Macro_Fmt ICHITARO_Compr_Fmt

468

415

469

417

Microsoft Visio 2013 macro ICHITARO Compressed format

application/vnd.visio application/x-js-taro

IDOL KeyView (12.12)

Extension

File Class

Readers

JB2, JBIG2 EFX AD1 SKP GWFS JNT YCHAT MAX ARJ

adRASTERIMAGE adRASTERIMAGE adENCAPSULATION adVECTORGRAPHIC adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adENCAPSULATION

kpJBIG2rdr ad1sr gwfssr
multiarcsr

RPMSG

adENCAPSULATION

rpmsgsr

MAT, FIG SGY, SEGY MPG

adWORDPROCESSOR adWORDPROCESSOR adMOVIE

EVT EVTX OLM

adMISC adMISC adENCAPSULATION

olmsr

WARC CLASS VCF EDB

adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION

vcfsr

ICS, VCS

adENCAPSULATION

icssr

VSDX, VSTX, VSSX adPRESENTATION

VSDM, VSTM, VSSM
JTDC

adPRESENTATION adWORDPROCESSOR

ActiveX components, kpVSDXrdr kpVSDXrdr
jtdsr

Page 122 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name IWWP13_Fmt IWSS13_Fmt IWPG13_Fmt

Number 470 471 472

Category 418 419 420

Description Apple iWork 2013 Pages format Apple iWork 2013 Numbers format Apple iWork 2013 Keynote format

MIME Type

XZ_Fmt

473

421

Sony_WAVE64_Fmt

474

422

Conifer_WAVPACK_Fmt

475

423

Xiph_OGG_VORBIS_Fmt

476

424

MS_Visio_2013_Stencil_Fmt

477

415

MS_Visio_2013_Stencil_Macro_Fmt 478

415

MS_Visio_2013_Template_Fmt

479

415

MS_Visio_2013_Template_Macro_Fmt 480

415

Borland_Reflex_2_Fmt PKCS_12_Fmt B1_Fmt ISO_IEC_MPEG_4_Fmt

481

425

482

426

483

427

484

428

RAR5_Fmt Unigraphics_NX_Fmt PTC_Creo_Fmt KML_Fmt KMZ_Fmt WML_Fmt ODF_Formula_Fmt SO_Text_Fmt

485

429

486

362

487

430

488

431

489

432

490

433

491

434

492

435

XZ archive format

application/x-xz

Sony Wave64 format

audio/wav64

Conifer Wavpack format

audio/x-wavpack

Xiph Ogg Vorbis format

audio/ogg

MS Visio 2013 stencil format

application/vnd.visio

MS Visio 2013 stencil Macro format application/vnd.visio

MS Visio 2013 template format

application/vnd.visio

MS Visio 2013 template Macro format

application/vnd.visio

Borland Reflex 2 format

PKCS #12 (p12) format

application/x-pkcs12

B1 format

application/x-b1

ISO/IEC MPEG-4 (ISO 14496) format

video/mp4

RAR5 Format

application/x-rar-compressed

Unigraphics (UG) NX CAD Format

PTC Creo CAD Format

Keyhole Markup Language

application/vnd.google-earth.kml+xml

Zipped Keyhole Markup Language application/vnd.google-earth.kmz

Wireless Markup Language

text/vnd.wap.wml

ODF Formula

application/vnd.oasis.opendocument.formula

Star Office 4,5 Writer Text

application/vnd.stardivision.writer

SO_Spreadsheet_Fmt SO_Presentation_Fmt SO_Math_Fmt STEP_Fmt STL_Fmt

493

436

494

437

495

438

496

439

497

364

Star Office 4,5 Calc Spreadsheet application/vnd.stardivision.calc

Star Office 4,5 Impress Presentation application/vnd.stardivision.draw

Star Office 4,5 Math

application/vnd.stardivision.math

ISO 10303-21 STEP format

3D Systems Stereo Lithography STL

Extension IWA, PAGES IWA, NUMBERS IWA, KEY
XZ W64 WV OGG VSSX VSSM VSTX VSTM

File Class adWORDPROCESSOR adSPREADSHEET adPRESENTATION
adENCAPSULATION adSOUND adSOUND adSOUND adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION

Readers iwwp13sr iwss13sr kpIWPG13rdr, kpIWPGrdr multiarcsr
kpVSDXrdr kpVSDXrdr kpVSDXrdr kpVSDXrdr

R2D P12, PFX B1 MP4

adDATABASE adWORDPROCESSOR adENCAPSULATION adMOVIE

b1sr mpeg4sr

RAR PRT ASM, PRT KML KMZ WML ODF SDW, SGL, VOR
SDC SDD, SDA SMF

adENCAPSULATION adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR
adSPREADSHEET adPRESENTATION adMISC adMISC adCAD

multiarcsr kpUGrdr
xmlsr unzip xmlsr unzip kpsdwrdr, starwsr starcsr kpsddrdr olesr

IDOL KeyView (12.12)

Page 123 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
AppleScript_Fmt Assembly_Fmt C_Fmt Csharp_Fmt CPlusPlus_Fmt Css_Fmt Clojure_Fmt CoffeeScript_Fmt Lisp_Fmt Dockerfile_Fmt Eiffel_Fmt Erlang_Fmt Fsharp_Fmt Fortran_Fmt Go_Fmt Groovy_Fmt Haskell_Fmt Ini_Fmt Java_Fmt Javascript_Fmt Lua_Fmt Makefile_Fmt Mathematica_Fmt
ObjC_Fmt ObjCpp_Fmt ObjJ_Fmt PHP_Fmt PLSQL_Fmt Pascal_Fmt

Number
498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520
521 522 523 524 525 526

Category
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462
464 465 466 467 468 469

Description ASCII format AppleScript Source Code3 Assembly Code3 C Source Code3 C# Source Code3 C++ Source Code3 Cascading Style Sheet3 Clojure Source Code3 CoffeeScript Source Code3 Common Lisp Source Code3 Dockerfile3 Eiffel Source Code3 Erlang Source Code3 F# Source Code3 Fortran Source Code3 Go Source Code3 Groovy Source Code3 Haskell Source Code3 Initialization (INI) file3 Java Source Code3 Javascript Source Code3 Lua Source Code3 Makefile3 Wolfram Mathematica Source Code3 Objective-C Source Code3 Objective-C++ Source Code3 Objective-J Source Code3 PHP Source Code3 PLSQL Source Code3 Pascal Source Code3

MIME Type
text/x-applescript text/x-assembly text/x-c text/x-csharp text/x-c++ text/css text/x-clojure text/x-coffeescript text/x-common-lisp text/x-dockerfile text/x-eiffel text/x-erlang text/x-fsharp text/x-fortran text/x-go text/x-groovy text/x-haskell text/x-ini text/x-java-source text/javascript text/x-lua text/x-makefile text/x-mathematica
text/x-objc text/x-objectivec++ text/x-objectivej text/x-php text/x-plsql text/x-pascal

IDOL KeyView (12.12)

Extension

File Class

APPLESCRIPT
C, H CS CPP, HPP CSS CLJ, CL2 COFFEE, CAKE EL
E ERL, ES FS F GO GRT, GVY HS
JAVA JS LUA MAKE M
J PHP
PASCAL

adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE
adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE

Readers
afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr
afsr afsr afsr afsr afsr afsr

Page 124 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Perl_Fmt Powershell_Fmt Prolog_Fmt Puppet_Fmt Python_Fmt R_Fmt Ruby_Fmt Rust_Fmt Scala_Fmt Shell_Fmt Smalltalk_Fmt ML_Fmt Swift_Fmt Tcl_Fmt
Tex_Fmt TypeScript_Fmt Verilog_Fmt YAML_Fmt Wiki_Fmt MS_Word_2007_Flat_XML_Fmt Matroska_Fmt SVG_Fmt Shapefile_Fmt Flash_Video_Fmt Embedded_OpenType_Fmt Web_Open_Font_Fmt OpenType_Fmt MNG_Fmt JNG_Fmt AppleScript_Binary_Fmt

Number 527 528 529 530 531 532 533 534 535 536 537 538 539 540
541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556

Category 470 471 472 473 474 475 476 477 478 479 480 481 482 483
484 485 486 487 488 301 489 490 491 492 493 494 495 496 497 498

Description Perl Source Code3 PowerShell Source Code3 Prolog Source Code3 Puppet Source Code3 Python Source Code3 R Source Code3 Ruby Source Code3 Rust Source Code3 Scala Source Code3 Shell Script3 Smalltalk Source Code3 Standard ML Source Code3 Swift Source Code3

MIME Type text/x-perl text/x-powershell text/x-prolog text/x-puppet text/x-python text/x-rsrc text/x-ruby text/x-rust text/x-scala application/x-sh text/x-stsrc text/x-ml text/x-swift

Tool Command Language (Tcl) Source Code3 TeX Typesetting File3 TypeScript Source Code3 Verilog Source Code3 YAML File3 MediaWiki File3

text/x-tcl
application/x-tex text/x-typescript text/x-verilog text/x-yaml text/x-mediawiki

Microsoft Word 2007 XML - Flat xml text/xml

Matroska video/audio File

video/x-matroska

Scalable Vector Graphics image

image/svg+xml

Shapefile

application/x-shapefile

Flash video File

video/x-flv

Embedded OpenType font

application/vnd.ms-fontobject

Web Open Font Format

font/woff

OpenType Font

font/otf

Multiple-image Network Graphics video/x-mng

JPEG Network Graphics

image/x-jng

AppleScript Binary Source Code

IDOL KeyView (12.12)

Extension PL PS1 PRO, PROLOG PP PY R RB RS SC SH ST ML SWIFT TM
TS V YML
XML MKV, MKA SVG SHP, SHX FLV EOT WOFF, WOFF2 OTF MNG JNG SCPT

File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE

Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr

adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adWORDPROCESSOR adWORDPROCESSOR adMOVIE adVECTORGRAPHIC adGIS adMOVIE adFONT adFONT adFONT adANIMATION adRASTERIMAGE adSOURCECODE

afsr afsr afsr afsr afsr mwxsr
xmlsr

Page 125 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Maya_Binary_Fmt Jupiter_Tesselation_Fmt OGV_Fmt OGG_Container_Fmt GNU_Message_Catalog_Fmt Windows_Shortcut_Fmt Apple_Typedstream_Fmt
XCF_Fmt PaintShop_Pro_Fmt SQLite_Database_Fmt MySQL_Table_Fmt Microsoft_Program_DB_Fmt OpenEXR_Fmt XMV_Fmt AMV_Fmt NIFF_Fmt CuBase_Fmt SoundFont_Fmt WebP_Fmt ICC_Fmt PCF_Fmt WebM_Fmt AMFF_Fmt ANBM_Fmt ANIM_Fmt
DEEP_Fmt FAXX_Fmt ICON_Fmt ILBM_Fmt

Number 557 558 559 560 561 562 563
564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581
582 583 584 585

Category 499 363 500 501 502 503 504
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522
523 524 525 526

Description

MIME Type

Autodesk Maya binary file

UGS Jupiter Tesselation file

Ogg Theora Video format

video/ogg

General Ogg Container format

application/ogg

GNU Message Catalog format

Windows shortcut file

application/x-ms-shortcut

Apple/NeXT typedstream data format

GIMP XCF image

image/x-xcf

PaintShop Pro image

SQLite database format

application/x-sqlite3

MySQL table definition file

Microsoft Program Database format

OpenEXR image format

4X Movie File

AMV video file

Notation Interchange File Format

Steinberg CuBase file

SoundFont file

WebP image

image/webp

International Color Consortium files application/vnd.iccprofile

X11 Portable Compiled Font file

application/x-font-pcf

WebM video file

video/webm

Amiga Metafile

IFF Animated Bitmap

IFF Amiga animated raster graphics format

IFF-DEEP TVPaint image

IFF-FAXX Facsimile image

IFF Glow Icon image

Interleaved BitMap image

IDOL KeyView (12.12)

Extension MB JT OGV OGG MO LNK

File Class adCAD adCAD adMOVIE adMISC adMISC adMISC adMISC

Readers

XCF PSP, PSPIMAGE QHC FRM PDB EXR 4XM AMV NIF
WEBP ICC, ICM PCF WEBM AMF

adRASTERIMAGE adRASTERIMAGE adDATABASE adDATABASE adDATABASE adRASTERIMAGE adMOVIE adMOVIE adSOUND adSOUND adSOUND adRASTERIMAGE adMISC adFONT adMOVIE adVECTORGRAPHIC adRASTERIMAGE adRASTERIMAGE

DEEP IFF

adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE

Page 126 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name LWOB_Fmt MAUD_Fmt
PBM_Fmt TDDD_Fmt
DjVu_Fmt InDesign_Fmt Calamus_Fmt Adaptive_MultiRate_Fmt FLAC_Fmt Ogg_FLAC_Fmt SAS7BDAT_Fmt
Design_Web_Format_Fmt Adobe_Flash_Audio_Book_Fmt Adobe_Flash_Audio_Fmt Adobe_Flash_Protected_Video_Fmt Adobe_Flash_Video_Fmt Audible_Audiobook_Fmt Canon_Camera_Fmt Canon_Raw_Fmt Casio_Camera_Fmt Convergent_Design_Fmt DMB_MAF_Audio_Fmt DMB_MAF_Video_Fmt DMP_Content_Fmt
DVB_Fmt Dirac_Wavelet_Compression_Fmt
HEICS_Image_Sequence_Fmt

Number 586 587
588 589
590 591 592 593 594 595 596
597 598 599 600 601 602 603 604 605 606 607 608 609
610 611
612

Category 527 528
529 530
531 532 533 534 535 536 537
538 539 540 541 542 543 544 545 546 547 548 549 550
551 552
553

Description

MIME Type

LightWave Object format

IFF-MAUD MacroSystem audio format

IFF Planar BitMap

IFF TDDD and Imagine Object animation format

AT&T DjVu format

image/vnd.djvu

Adobe InDesign document

application/x-indesign

Calamus Desktop Publishing

Adaptive Multi-Rate audio format audio/amr

Free Lossless Audio Codec format audio/flac

Ogg Container FLAC audio format

SAS7BDAT database storage format

Autodesk Design Web Format

model/vnd.dwf

Adobe Flash Player audio book

audio/mp4

Adobe Flash Player audio

audio/mp4

Adobe Flash Player protected video video/mp4

Adobe Flash Player video

video/x-f4v

Audible Enhanced Audiobook

audio/vnd.audible.aax

Canon Digital Camera image

Canon Raw image

Casio Digital Camera image

Convergent Design file

DMB MAF audio

DMB MAF video

Digital Media Project Content Format

Digital Video Broadcast format

video/vnd.dvb.file

ISO-BMFF Dirac Wavelet compression

High Efficiency Image Format HEVC image/heic-sequence image sequence

IDOL KeyView (12.12)

Extension LWOB
TDD DJVU INDD AMR FLAC OGG SAS7BDAT DWF F4B F4A F4P F4V AAX CR3
DVB
HEICS

File Class adMISC adSOUND

Readers

adRASTERIMAGE adRASTERIMAGE

adWORDPROCESSOR adDESKTOPPUBLSH adDESKTOPPUBLSH adSOUND adSOUND adSOUND adDATABASE

sassr

adCAD adSOUND adSOUND adMOVIE adMOVIE adSOUND adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adSOUND adMOVIE adMISC

mpeg4sr mpeg4sr mpeg4sr mpeg4sr mpeg4sr

adMOVIE adMISC

adRASTERIMAGE

Page 127 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name HEIC_Image_Fmt
HEIFS_Image_Sequence_Fmt
HEIF_Image_Fmt ISMACryp_Fmt ISO_3GPP2_Fmt ISO_3GPP_Fmt ISO_JPEG2000_JP2_Fmt
ISO_JPEG2000_JPM_Fmt
ISO_JPEG2000_JPX_Fmt
ISO_QuickTime_Fmt KDDI_Video_Fmt MAF_Photo_Player_Fmt MPEG4_AVC_Fmt
MPEG4_M4A_Fmt MPEG4_M4B_Fmt MPEG4_M4P_Fmt
MPEG4_M4V_Fmt MPEG4_Sony_PSP_Fmt MPEG_21_Fmt Mobile_QuickTime_Fmt Motion_JPEG_2000_Fmt
NTT_MPEG4_Fmt Nero_MPEG4_AVC_Profile
Nero_MPEG4_Audio_Fmt Nero_MPEG4_Profile

Number 613
614
615 616 617 618 619

Category 554
555
556 557 558 559 560

Description

MIME Type

High Efficiency Image Format HEVC image/heic image

High Efficiency Image Format image image/heif-sequence sequence

High Efficiency Image Format image image/heif

ISMACryp 2.0 Encrypted format

3GPP2 video file

video/3gpp2

3GPP video file

video/3gpp

ISO-BMFF JPEG 2000 image

image/jp2

620

561

621

562

622

563

623

564

624

565

625

566

626

567

627

568

628

569

629

570

630

571

631

572

632

573

633

574

ISO-BMFF JPEG 2000 compound image ISO-BMFF JPEG 2000 with extensions Apple ISO-BMFF QuickTime video KDDI Video file MAF Photo Player ISO-BMFF MPEG-4 with AVC extension Apple MPEG-4 Part 14 audio Apple MPEG-4 Part 14 audio book Apple MPEG-4 Part 14 protected audio Apple MPEG-4 Part 14 video Sony PSP MPEG-4 MPEG-21 Mobile QuickTime video Motion JPEG 2000

image/jpm
image/jpx
video/quicktime video/3gpp2
video/mp4
audio/x-m4a audio/mp4 audio/mp4
video/x-m4v audio/mp4 audio/mp4 video/quicktime video/mj2

634

575

635

576

636

577

637

578

NTT MPEG-4 Nero MPEG-4 profile with AVC extension Nero AAC audio Nero MPEG-4 profile

video/mp4 video/mp4
audio/mp4 video/mp4

IDOL KeyView (12.12)

Extension HEIC HEIFS HEIF
3G2 3GP JP2 JPM JPX QT, MOV
M4A M4B M4P M4V MP4
MQV MJ2, MJP2

File Class adRASTERIMAGE
adRASTERIMAGE
adRASTERIMAGE adENCAPSULATION adMOVIE adMOVIE adRASTERIMAGE
adRASTERIMAGE
adRASTERIMAGE
adMOVIE adMOVIE adMISC adMOVIE
adSOUND adSOUND adSOUND
adMOVIE adSOUND adMISC adMOVIE adMOVIE
adMOVIE adMOVIE
adSOUND adMOVIE

Readers
mpeg4sr mpeg4sr jp2000sr, kpjp2000rdr jp2000sr, kpjp2000rdr jp2000sr, kpjp2000rdr MCI mpeg4sr
mpeg4sr
mpeg4sr mpeg4sr mpeg4sr
mpeg4sr mpeg4sr mpeg4sr MCI jp2000sr, kpjp2000rdr mpeg4sr
mpeg4sr

Page 128 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name OMA_DRM_Fmt Panasonic_Camera_Fmt Ross_Video_Fmt SDA_Video_Fmt Samsung_Stereoscopic_Fmt Sony_XAVC_Fmt JPEG_2000_PGX_Fmt
Apple_Desktop_Services_Store_Fmt Core_Audio_Fmt VICAR_Fmt

Number 638 639 640 641 642 643 644
645 646 647

Category 579 580 581 582 583 584 585
586 587 588

Description

MIME Type

OMA DRM (ISOBMFF) Format

Panasonic Digital Camera image

Ross video

SDA SD Memory Card video

Samsung stereoscopic stream

Sony XAVC video

JPEG 2000 PGX Verification Model image

Apple Desktop Services Store file

Apple Core Audio Format

audio/x-caf

VICAR image format

FITS_Fmt DIF_Fmt MPEG_Transport_Stream_Fmt

648

589

649

590

650

591

Flexible Image Transport System FITS image
Digital Interface Format (DIF) DV video
MPEG Transport Stream data

image/fits video/MP2T

MPEG_Sequence_Fmt Ogg_OGM_Fmt Ogg_Speex_Fmt Ogg_Opus_Fmt Musepack_Audio_Fmt ART_Image_Fmt Vivo_Fmt QCP_Fmt CSP_Codec_Fmt TwinVQ_Fmt Interplay_MVE_Fmt IRIX_Moviemaker_Fmt
Sega_FILM_Fmt SMAF_Fmt

651

592

652

593

653

594

654

595

655

596

656

597

657

598

658

599

659

600

660

601

661

602

662

603

663

604

664

605

MPEG Sequence format Ogg OGM video format Ogg Speex audio format Ogg Opus audio format Musepack audio format ART image format Vivo audio-video format Qualcomm QCP audio Creative Signal Processor codec NTT TwinVQ audio format Interplay MVE video format IRIX Silicon Graphics moviemaker video file Sega FILM video format Synthetic music Mobile Application

video/mpeg video/ogg audio/ogg audio/ogg audio/x-musepack video/vnd.vivo audio/qcelp
video/x-sgi-movie
application/vnd.smaf

IDOL KeyView (12.12)

Extension
PGX DS_Store CAF IMG, MAP, VIC, VICAR FIT DV TS, M2T, M2TS, MTS
OGM SPX OGG MPC ART VIV QCP CSP VQF MVE MV, MOVIE CPK, CAK MMF

File Class adMISC adRASTERIMAGE adMOVIE adMOVIE adMISC adMOVIE adRASTERIMAGE
adMISC adSOUND adRASTERIMAGE
adRASTERIMAGE
adMOVIE
adMISC
adMISC adMOVIE adSOUND adSOUND adSOUND adRASTERIMAGE adMOVIE adSOUND adMISC adSOUND adMOVIE adMOVIE
adMOVIE adSOUND

Readers
mpeg4sr jp2000sr, kpjp2000rdr

Page 129 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
NIST_SPHERE_Fmt
Chinese_AVS_Fmt VQA_Fmt
YAFA_Fmt Origin_MVE_Fmt
BBC_Dirac_Fmt Maya_ASCII_Fmt RenderMan_Fmt
NOFF_Binary_Fmt VTK_ASCII_Fmt
VTK_Binary_Fmt
Wolfram_CDF_Fmt
Wolfram_Notebook_Fmt
HDF4_Fmt HDF5_Fmt ARMovie_Fmt Windows_TV_DVR_Fmt InstallShield_Z_Fmt MS_DirectDraw_Surface_Fmt
Bink_Fmt LZMA_Fmt True_Audio_Fmt Keepass_Fmt RPM_Fmt

Number
665
666 667
668 669
670 671 672
673 674
675
676
677
678 679 680 681 682 683
684 685 686 687 688

Category
606
607 608
609 610
611 612 613
614 615
616
617
618
619 620 621 622 623 624
625 626 627 628 629

Description Format NIST SPeech HEader REsources format Chinese AVS video format Westwood Studios Vector Quantized Animation video file Wildfire YAFA animation Origin Wing Commander III MVE movie format BBC Dirac video format Autodesk Maya ASCII file format Pixar RenderMan Interface Bytestream file NOFF 3D Object File Format Visualization Toolkit VTK ASCII format Visualization Toolkit VTK Binary format Wolfram Mathematica Computable Document Format Wolfram Mathematica Notebook Format Hierarchical Data Format HDF4 Hierarchical Data Format HDF5 Acorn RISC ARMovie video format Windows Television DVR format InstallShield Z archive format Microsoft DirectDraw Surface container format Bink audio-video container format LZMA compressed data format True Audio format Keepass Password file RPM Package Manager file

MIME Type
video/x-dirac
application/cdf application/x-hdf application/x-hdf application/x-compress application/x-lzma audio/x-tta application/x-rpm

IDOL KeyView (12.12)

Extension
NIST
VQA
YAFA MVE
DRC MA RIB
NOFF VTK
VTK
CDF
NB
HDF, H4 HDF, H5 RPL WTV Z DDS
BIK, BK2 LZMA TTA KDB, KDBX RPM

File Class

Readers

adSOUND
adMOVIE adANIMATION
adANIMATION adMOVIE
adMOVIE adCAD adVECTORGRAPHIC
adVECTORGRAPHIC adVECTORGRAPHIC
adVECTORGRAPHIC
adMISC
adMISC
adMISC adMISC adMOVIE adMOVIE adENCAPSULATION adENCAPSULATION
adMOVIE adENCAPSULATION adSOUND adMISC adENCAPSULATION

Page 130 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Printer_Font_Metrics_Fmt Adobe_Font_Metrics_Fmt Printer_Font_ASCII_Fmt Netware_Loadable_Module_Fmt TCPdump_pcap_Fmt
Multiple_Master_Font_Fmt TrueType_Font_Collection_Fmt Shapefile_Spatial_Index_Fmt Java_Key_Store_Fmt Java_JCE_Key_Store_Fmt Quark_Xpress_Intel_Fmt Windows_Imaging_Fmt
VMware_Virtual_Disk_Fmt XPConnect_Typelib_Fmt MS_DOS_Compression_Fmt
DLS_Fmt MS_Windows_Registry_Fmt Microsoft_Help_2_Fmt Qt_Translation_Fmt PEM_SSL_Certificate_Fmt

Number 689 690 691 692 693
694 695 696 697 698 699 700
701 702 703
704 705 706 707 708

Category 630 631 632 633 634
635 636 637 638 639 640 641
642 643 644
645 646 647 648 649

Description

MIME Type

Adobe Printer Font Metrics format application/x-font-printer-metric

Adobe Font Metrics ASCII format application/x-font-adobe-metric

Adobe Printer Font ASCII format

application/x-font-type1

Netware Loadable Module format

TCPdump packet stream capture savefile format

application/vnd.tcpdump.pcap

Adobe Multiple master font format

TrueType font collection format

application/x-font-ttf

Shapefile binary spatial index format application/x-shapefile

Java Key Store format

application/x-java-keystore

Java JCE Key Store format

application/x-java-jce-keystore

QuarkXPress Intel format

application/vnd.quark.quarkxpress

Microsoft Windows Imaging Format WIM

VMware Virtual Disk Format 5.0

application/x-vmdk

XPConnect Typelib Format

Microsoft MS-DOS installation compression (SZDD, KWAJ)

application/x-ms-compress

DLS Downloadable Sounds format

Microsoft Windows Registry format

Microsoft Help 2.0 format

application/x-ms-reader

Qt binary translation file format

PEM-encoded SSL certificate

application/pkix-cert

PostScript_Printer_Description_Fmt

709

650

Speedo_Font_Fmt InstallShield_Cabinet_Fmt InstallShield_Uninstall_Fmt MS_OEDBX_Folder_Fmt

710

651

711

652

712

653

713

654

LabVIEW_Fmt

714

655

Adobe PostScript Printer Description file

application/vnd.cups-ppd

Speedo Font format

InstallShield Cabinet Archive format

InstallShield Uninstall format

Outlook Express DBX folder database format

National Instruments LabVIEW file format

IDOL KeyView (12.12)

Extension PFM AFM PFA NLM PCAP

File Class adFONT adFONT adFONT adMISC adMISC

Readers
afmsr pfasr

MMM TTC SBX, SBN KS
QXB WIM

adFONT adFONT adGIS adMISC adMISC adDESKTOPPUBLSH adENCAPSULATION

VMDK XPT EX_

adMISC adMISC adENCAPSULATION

DLS
HXD, HXW, HXH QM CRT, PEM, CER, KEY PPD

adSOUND adMISC adENCAPSULATION adMISC adENCAPSULATION
adMISC

SPD CAB, HDR ISU DBX

adFONT adENCAPSULATION adENCAPSULATION adENCAPSULATION

VI

adMISC

Page 131 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name SAP_Archive_SAR_Fmt
Netscape_Address_Book_Fmt Universal_3D_Fmt Open_Inventor_ASCII_Fmt Open_Inventor_Binary_Fmt X_Window_Dump_Fmt Git_Packfile_Fmt Xara_Xar_Fmt Internet_Archive_ARC_Fmt Applix_Builder_Fmt Applix_Bitmap_Fmt PEM_RSA_Private_Key_Fmt MIFF_Fmt Subversion_Dump_Fmt Virtual_Hard_Disk_Fmt Direct_Access_Archive_Fmt
Debian_Binary_Fmt XUL_Fastload_Fmt Nastran_OP2_Fmt Binary_Logging_Fmt Measurement_Data_Fmt Abaqus_ODB_Fmt Open_Diagnostic_Data_Exchange_ Fmt Vector_ASCII_Fmt LSDYNA_State_Database_Fmt LSDYNA_Binary_Output_Fmt
MS_Power_BI_Fmt Tableau_Workbook_Fmt

Number 715
716 717 718 719 720 721 722 723 724 725 726 727 728 729 730
731 732 733 734 735 736 737
738 739 740
741 742

Category 656
657 658 659 660 661 662 663 664 665 666 667 668 669 670 671
672 673 674 675 676 677 678
679 680 681
682 683

Description SAP compression archive SAR format Netscape Address Book format Universal 3D file format Open Inventor ASCII format Open Inventor Binary format X Window Dump image Git Packfile format Xara X Xar image format Internet Archive ARC format Applix Builder format Applix Bitmap image format PEM-encoded RSA private key Magick Image File Format Subversion Dump format Microsoft Virtual Hard Disk format PowerISO Direct Access Archive format Debian binary package format Mozilla XUL Fastload format Nastran OP2 format CAD Binary Logging Format CAD Measurement Data Format Abaqus ODB Format Vector Open Diagnostic Data Exchange format Vector CAD ASCII ASC format LS-DYNA State Database format LS-DYNA binary output (binout) format Microsoft Power BI Desktop format Tableau Workbook format

MIME Type
image/x-xwindowdump application/vnd.xara application/x-ia-arc
application/x-vhd application/x-debian-package

IDOL KeyView (12.12)

Extension SAR
NAB U3D IV IV XWD PACK XAR ARC AB IM PEM MIF, MIFF
VHD DAA
DEB MFL OP2 BLF MDF ODB ODX
ASC
PBIX TWB

File Class adENCAPSULATION

Readers

adMISC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adRASTERIMAGE adWORDPROCESSOR adVECTORGRAPHIC adENCAPSULATION adMISC adRASTERIMAGE adENCAPSULATION adRASTERIMAGE adENCAPSULATION adENCAPSULATION adENCAPSULATION

gitpacksr

adENCAPSULATION adMISC adCAD adCAD adCAD adCAD adCAD

xmlsr

adCAD adCAD adCAD

adANALYTICS adANALYTICS

pbixsr xmlsr

Page 132 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Tableau_Packaged_Workbook_Fmt

Number 743

Tableau_Extract_Fmt

744

Tableau_Data_Source_Fmt

745

Tableau_Packaged_Data_Source_Fmt 746

Tableau_Preferences_Fmt

747

Tableau_Map_Source_Fmt

748

ABAP_Fmt

749

AMPL_Fmt

750

APL_Fmt

751

ASN1_Fmt

752

ATS_Fmt

753

Agda_Fmt

754

Alloy_Fmt

755

Apex_Fmt

756

Arduino_Fmt

757

AsciiDoc_Fmt

758

AspectJ_Fmt

759

Awk_Fmt

760

BlitzMax_Fmt

761

Bluespec_Fmt

762

Brainfuck_Fmt

763

Brightscript_Fmt

764

CLIPS_Fmt

765

CMake_Fmt

766

COBOL_Fmt

767

Category 684
685 686 687
688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708

Description Tableau Packaged Workbook format Tableau Extract format Tableau Data Source format Tableau Packaged Data Source format Tableau Preferences format Tableau Map Source format ABAP Source Code4 AMPL Source Code4 APL Source Code4 ASN.1 Source Code4 ATS Source Code4 Agda Source Code4 Alloy Source Code4 Apex Source Code4 Arduino Source Code4 AsciiDoc Source Code4 AspectJ Source Code4 Awk Source Code4 BlitzMax Source Code4 Bluespec Source Code4 Brainfuck Source Code4 Brightscript Source Code4 CLIPS Source Code4 CMake Source Code4 COBOL Source Code4

CWeb_Fmt CartoCSS_Fmt Ceylon_Fmt

768

709

769

710

770

711

CWeb Source Code4 CartoCSS Source Code4 Ceylon Source Code4

MIME Type
text/x-abap
text/x-agda text/x-alloy text/x-arduino text/x-asciidoc text/x-aspectj text/x-awk text/x-bmx text/x-brainfuck
text/x-cmake text/x-cobol
text/x-ceylon

IDOL KeyView (12.12)

Extension TWBX

File Class adANALYTICS

TDE TDS TDSX

adANALYTICS adANALYTICS adANALYTICS

TPS TMS ABAP AMPL APL ASN
AGDA ALS CLS INO ASC AJ AWK BMX BSV B, BF BRS CLP CMAKE CBL, CCP, COB, CPY W MSS CEYLON

adANALYTICS adANALYTICS adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE
adSOURCECODE adSOURCECODE adSOURCECODE

Readers unzip
xmlsr unzip
xmlsr xmlsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr
afsr afsr afsr

Page 133 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Chapel_Fmt Clarion_Fmt Clean_Fmt Component_Pascal_Fmt Cool_Fmt Coq_Fmt Creole_Fmt Crystal_Fmt Csound_Fmt Csound_Document_Fmt Cuda_Fmt D_Fmt DIGITAL_Command_Language_Fmt
DTrace_Fmt Dart_Fmt E_Fmt ECL_Fmt Elm_Fmt Emacs_Lisp_Fmt EmberScript_Fmt Fantom_Fmt Forth_Fmt FreeMarker_Fmt Frege_Fmt G_code_Fmt GAMS_Fmt GAP_Fmt GDScript_Fmt GLSL_Fmt Game_Maker_Language_Fmt

Number 771 772 773 774 775 776 777 778 779 780 781 782 783
784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800

Category 712 713 714 715 716 717 718 719 720 721 722 723 724
725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741

Description Chapel Source Code4 Clarion Source Code4 Clean Source Code4 Component Pascal Source Code4 Cool Source Code4 Coq Source Code4 Creole Source Code4 Crystal Source Code4 Csound Source Code4 Csound Document Source Code4 Cuda Source Code4 D Source Code4 DIGITAL Command Language Source Code4 DTrace Source Code4 Dart Source Code4 E Source Code4 ECL Source Code4 Elm Source Code4 Emacs Lisp Source Code4 EmberScript Source Code4 Fantom Source Code4 Forth Source Code4 FreeMarker Source Code4 Frege Source Code4 G-code Source Code4 GAMS Source Code4 GAP Source Code4 GDScript Source Code4 GLSL Source Code4 Game Maker Language Source

MIME Type
text/x-component-pascal text/x-coq
text/x-cuda text/x-d
text/x-dart application/x-ecl text/x-elm text/x-emacs-lisp application/x-fantom text/x-forth
text/x-glslsrc

IDOL KeyView (12.12)

Extension CHPL CLW DCL, ICL CP CL V CREOLE CR ORC CSD CU DCL, ICL COM
D DART E ECL ELM EL EM FAN FOR, FORTH FTL FR G GMS
GD GLSL GML

File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE
adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE

Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr
afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr

Page 134 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
Gnuplot_Fmt Golo_Fmt Gosu_Fmt Gradle_Fmt GraphQL_Fmt Graphviz_DOT_Fmt HLSL_Fmt Hack_Fmt Haml_Fmt Handlebars_Fmt Hy_Fmt IDL_Fmt IGOR_Pro_Fmt Idris_Fmt Inform_7_Fmt Ioke_Fmt Isabelle_Fmt J_Fmt JSONiq_Fmt JSX_Fmt Jasmin_Fmt Jolie_Fmt Julia_Fmt KiCad_Layout_Fmt KiCad_Schematic_Fmt Kotlin_Fmt LFE_Fmt LOLCODE_Fmt Lasso_Fmt

Number
801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829

Category
742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770

Description Code4 Gnuplot Source Code4 Golo Source Code4 Gosu Source Code4 Gradle Source Code4 GraphQL Source Code4 Graphviz (DOT) Source Code4 HLSL Source Code4 Hack Source Code4 Haml Source Code4 Handlebars Source Code4 Hy Source Code4 IDL Source Code4 IGOR Pro Source Code4 Idris Source Code4 Inform 7 Source Code4 Ioke Source Code4 Isabelle Source Code4 J Source Code4 JSONiq Source Code4 JSX Source Code4 Jasmin Source Code4 Jolie Source Code4 Julia Source Code4 KiCad Layout Source Code4 KiCad Schematic Source Code4 Kotlin Source Code4 LFE Source Code4 LOLCODE Source Code4 Lasso Source Code4

MIME Type text/x-gnuplot text/x-gosu
text/x-haml text/x-hy text/x-idl text/ipf text/x-idris text/x-iokesrc text/x-isabelle text/x-j
text/x-julia
text/x-kotlin text/x-lasso

IDOL KeyView (12.12)

Extension
GNU, GP GOLO GS GRADLE GRAPHQL DOT HLSL
HAML HBS HY PRO IPF IDR I7X IK
IJS JQ JSX J
JL
SCH KT LFE LOL LAS, LASSO

File Class
adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE

Readers
afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr

Page 135 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Limbo_Fmt LiveScript_Fmt M_Fmt MAXScript_Fmt Markdown_Fmt Matlab_Fmt Max_Code_Fmt Mercury_Fmt Modelica_Fmt Modula_2_Fmt Monkey_Fmt Moocode_Fmt NL_Fmt NSIS_Fmt NetLogo_Fmt NewLisp_Fmt Nginx_Fmt Nix_Fmt Nu_Fmt OCaml_Fmt OpenCL_Fmt OpenEdge_ABL_Fmt OpenSCAD_Fmt Ox_Fmt Oxygene_Fmt Oz_Fmt PAWN_Fmt PLpgSQL_Fmt Pan_Fmt Parrot_Assembly_Fmt

Number 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859

Category 771 772 773 774 775 463 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799

Description Limbo Source Code4 LiveScript Source Code4 M Source Code4 MAXScript Source Code4 Markdown Source Code4 Matlab Source Code4 Max Source Code4 Mercury Source Code4 Modelica Source Code4 Modula-2 Source Code4 Monkey Source Code4 Moocode Source Code4 NL Source Code4 NSIS Source Code4 NetLogo Source Code4 NewLisp Source Code4 Nginx Source Code4 Nix Source Code4 Nu Source Code4 OCaml Source Code4 OpenCL Source Code4 OpenEdge ABL Source Code4 OpenSCAD Source Code4 Ox Source Code4 Oxygene Source Code4 Oz Source Code4 PAWN Source Code4 PLpgSQL Source Code4 Pan Source Code4 Parrot Assembly Source Code4

MIME Type text/limbo text/x-livescript
text/x-matlab
text/x-modelica text/x-modula2 text/x-monkey text/x-moocode text/x-nsis text/x-newlisp text/x-nginx-conf text/x-nix text/x-ocaml text/x-openedge
text/x-pawn text/x-plpgsql

IDOL KeyView (12.12)

Extension
LS M MS MD M MXT
MO MOD MONKEY MOO NL NSI NLOGO NL VHOST NIX NU
CL
SCAD OX OXYGENE OZ PWN PLSQL PAN PASM

File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE

Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr

Page 136 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name PicoLisp_Fmt Pike_Fmt Pony_Fmt Processing_Fmt PureBasic_Fmt QMake_Fmt RAML_Fmt RDoc_Fmt REXX_Fmt Racket_Fmt Ragel_Fmt Rascal_Fmt Rebol_Fmt Red_Fmt RenPy_Fmt RenderScript_Fmt Ring_Fmt RobotFramework_Fmt SAS_Fmt SPARQL_Fmt SQL_Fmt SQLPL_Fmt SaltStack_Fmt Scheme_Fmt Scilab_Fmt Squirrel_Fmt Stan_Fmt Stata_Fmt Stylus_Fmt SuperCollider_Fmt

Number 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889

Category 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829

Description PicoLisp Source Code4 Pike Source Code4 Pony Source Code4 Processing Source Code4 PureBasic Source Code4 QMake File4 RAML Source Code4 RDoc Source Code4 REXX Source Code4 Racket Source Code4 Ragel Source Code4 Rascal Source Code4 Rebol Source Code4 Red Source Code4 Ren'Py Source Code4 RenderScript Source Code4 Ring Source Code4 RobotFramework Source Code4 SAS Source Code4 SPARQL format4 SQL format4 SQLPL Source Code4 SaltStack Source Code4 Scheme Source Code4 Scilab Source Code4 Squirrel Source Code4 Stan Source Code4 Stata Source Code4 Stylus Source Code4 SuperCollider Source Code4

MIME Type text/x-pike
text/x-rexx text/x-racket text/x-rebol text/x-red
text/x-robotframework application/sparql-query text/x-sql text/x-scheme text/scilab
text/supercollider

IDOL KeyView (12.12)

Extension
PIKE PONY PDE PB
RAML RDOC REXX
RSC REB, REBOL RED RPY RS RING ROBOT SAS
SLS
SCI NUT STAN
STYL SC

File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE

Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr

Page 137 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name SystemVerilog_Fmt TXL_Fmt Turing_Fmt Turtle_Fmt UrWeb_Fmt Vim_script_Fmt Visual_Basic_Fmt WebAssembly_Fmt WebIDL_Fmt X10_Fmt XQuery_Fmt Xojo_Fmt Xtend_Fmt YANG_Fmt Zephir_Fmt eC_Fmt reStructuredText_Fmt xBase_Fmt Windows_Installer_Fmt Autodesk_3ds_Max_Fmt PhotoDraw_Mix_Fmt Softimage_SCN_Fmt Parasolid_XT_Fmt Parasolid_XB_Fmt IGES_Fmt
ACE_Archive_Fmt Grasshopper_GHX_Fmt MS_FrontPage_Macro_Fmt
MS_AtWork_Fax_Fmt

Number 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914
915 916 917
918

Category 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854
855 856 857
858

Description SystemVerilog Source Code4 TXL Source Code4 Turing Source Code4 Turtle Source Code4 UrWeb Source Code4 Vim script File4 Visual Basic Source Code4 WebAssembly Source Code4 WebIDL Source Code4 X10 Source Code4 XQuery Source Code4 Xojo Source Code4 Xtend Source Code4 YANG Source Code4 Zephir Source Code4 eC Source Code4 reStructuredText Source Code4 xBase Source Code4 MSI Windows Installer format Autodesk 3ds Max format PhotoDraw MIX image Softimage Scene SCN format Parasolid ascii XT format Parasolid binary XB format Initial Graphics Exchange Specification format ACE archive format Grasshopper GHX format Microsoft FrontPage macro file format Microsoft AtWork Fax format

MIME Type text/x-systemverilog
text/turtle text/x-vim text/x-vbasic
text/x-x10 text/xquery text/x-xtend
text/x-ecsrc text/x-rst application/x-ole-storage image/vnd.mix
model/iges application/x-ace-compressed

IDOL KeyView (12.12)

Extension SV TXL T TTL UR, URS VIM VB WAT WEBIDL X10 XQM
XTEND YANG ZEP EC
MSI MAX MIX SCN X_T X_B IGS
ACE GHX FPM
AWD

File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adENCAPSULATION adCAD adRASTERIMAGE adCAD adCAD adCAD adCAD

Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr olesr olesr olesr

adENCAPSULATION adCAD adWORDPROCESSOR

xmlsr

adFAXFORMAT

olesr

Page 138 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Image_Composer_Fmt MS_Visual_InterDev_Fmt
Macromedia_Flash_FLA_OLE_Fmt
Corel_Draw_X4_Fmt Ogg_Daala_Fmt Ogg_BBC_Dirac_Fmt PKCS_7_Fmt Time_Stamped_Data_Fmt Sereal_Fmt Associated_Signature_Simple_Fmt
Associated_Signature_Extended_Fmt
iBooks_Fmt PDF_Forms_Data_Fmt PDF_XML_Forms_Data_Fmt AxCrypt_Fmt Unix_Archive_Fmt Berkeley_Btree_Database_Fmt Berkeley_Hash_Database_Fmt Berkeley_Log_Database_Fmt Berkeley_Queue_Database_Fmt BitTorrent_Fmt Chrome_Extension_Fmt Dalvik_Executable_Fmt Foxmail_Fmt GRIB_Fmt
Zstandard_Fmt LZ4_Fmt

Number 919 920
921
922 923 924 925 926 927 928
929
930 931 932 933 934 935 936 937 938 939 940 941 942 943
944 945

Category 859 860
861
862 863 864 865 866 867 868
869
870 871 872 873 874 875 876 877 878 879 880 881 882 883
884 885

Description

MIME Type

Microsoft Image Composer format

Microsoft Visual InterDev web project items file

Macromedia Flash FLA Project File OLE format

CorelDRAW version X4 onwards

application/x-vnd.corel.zcf.draw.document+zip

Ogg Daala video format

video/daala

Ogg BBC Dirac video format

video/x-dirac

PKCS #7 cryptographic format

application/pkcs7-signature

Time-stamped data format

application/timestamped-data

Sereal data serialization format

application/sereal

Associated Signature Container Simple format

application/vnd.etsi.asic-s+zip

Associated Signature Container Extended format

application/vnd.etsi.asic-e+zip

Apple iBooks format

application/x-ibooks+zip

PDF Forms Data Format

application/vnd.fdf

PDF XML Forms Data Format

application/vnd.adobe.xfdf

AxCrypt encrypted document

application/x-axcrypt

Unix Archive ar format

application/x-archive

Berkeley DB btree database format application/x-berkeley-db

Berkeley DB hash database format application/x-berkeley-db

Berkeley DB log database format application/x-berkeley-db

Berkeley DB queue database format application/x-berkeley-db

BitTorrent file format

application/x-bittorrent

Google Chrome Extension format application/x-chrome-package

Dalvik Executable dex format

application/x-dex

Foxmail email format

application/x-foxmail

General Regularly-distributed Information in Binary form GRIB format

application/x-grib

Zstandard compression format

application/zstd

LZ4 compressed file

application/x-lz4

Extension MIC WDM
FLA
CDRX OGV OGV P7S TSD SRL ASICS
ASICE
IBOOKS FDF XFDF AXX AR DB DB
TORRENT CRX DEX BOX GRB, GRIB2
ZSTD LZ4

File Class adRASTERIMAGE adSWDEV

Readers

adWORDPROCESSOR

adVECTORGRAPHIC adMOVIE adMOVIE adENCAPSULATION adENCAPSULATION adMISC adENCAPSULATION

pkcs7sr

adENCAPSULATION

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adDATABASE adDATABASE adDATABASE adDATABASE adMISC adENCAPSULATION adEXECUTABLE adWORDPROCESSOR adSCIENTIFIC

epubsr xmlsr

adENCAPSULATION adENCAPSULATION

zstdsr

IDOL KeyView (12.12)

Page 139 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Money_Fmt NetCDF_Fmt
SAS6_Data_Fmt SAS_Transport_Fmt Snappy_Framed_Fmt Stata_Data_Fmt SPSS_SAV_Fmt Zoo_Archive_Fmt CDX_Fmt CDXML_Fmt BPG_Fmt
Apple_Icon_Fmt NITF_Fmt
ERDAS_Imagine_Fmt MS_Office_Temporary_Owner_Fmt
EAC3_Audio_Fmt
COFF_Relocatable_Fmt
COFF_Executable_Fmt
COFF_Dynamic_Lib_Fmt
ELF_Core_Fmt Purify_Fmt Kryptel_Fmt Windows_Core_Dump_Fmt
Qt_Prerendered_Font_Fmt AIX_Relocatable_Fmt

Number 946 947
948 949 950 951 952 953 954 955 956
957 958
959 960
961
962
963
964
965 966 967 968
969 970

Category 886 887
888 889 890 891 892 893 894 895 896
897 898
899 900
901
902
903
904
905 906 907 908
909 910

Description

MIME Type

Microsoft Money format

application/x-msmoney

Network Common Data Form NetCDF format

application/x-netcdf

SAS 6 Data storage format

application/x-sas-data-v6

SAS Transport File XPORT format application/x-sas-xport

Snappy Framed compression format application/x-snappy-framed

Stata Data Format

application/x-stata-dta

SPSS Statistics Data File Format

Zoo Compressed Archive Format application/x-zoo

ChemDraw CDX format

chemical/x-cdx

ChemDraw CDXML format

application/vnd.chemdraw+xml

Better Portable Graphics BPG format

image/x-bpg

Apple Icon image format

image/icns

National Imagery Transmission Format NITF image

image/nitf

ERDAS Imagine image format

application/x-erdas-hfa

Microsoft Office temporary owner file

application/x-ms-owner

Enhanced-AC3 (EAC3) Audio File format

audio/eac3

Common Object File Format (COFF) application/x-object-file relocatable object

Common Object File Format (COFF) application/x-executable-file executable

Common Object File Format (COFF) application/x-library-file dynamic library

ELF Core file

application/x-coredump

Rational Purify data file

Kryptel encrypted file

Windows heap or mini core dump file

application/x-dmp

Qt Prerendered Font format

AIX/RISC COFF relocatable object application/x-object-file

IDOL KeyView (12.12)

Extension MNY NC SD2 XPT, XPORT SZ DTA SAV ZOO CDX CDXML BPG ICNS NTF, NITF HFA, RRD, AUX
AC3 O
PFY EDC DMP QPF2

File Class adSPREADSHEET adMISC
adDATABASE adDATABASE adENCAPSULATION adDATABASE adDATABASE adENCAPSULATION adSCIENTIFIC adSCIENTIFIC adRASTERIMAGE
adRASTERIMAGE adRASTERIMAGE
adRASTERIMAGE adMISC
adSOUND
adOBJECTMODULE
adEXECUTABLE
adLIBRARY
adMISC adMISC adENCAPSULATION adMISC
adFONT adOBJECTMODULE

Readers xmlsr

Page 140 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name AIX_Executable_Fmt AIX_Dynamic_Lib_Fmt HPUX_Relocatable_Fmt
HPUX_Executable_Fmt HPUX_Dynamic_Lib_Fmt
XML_EBCDIC_Fmt MPEG_JVT_H264_Fmt
Material_Exchange_Fmt
MS_Agent_Character_Fmt Quicken_Fmt MS_Outlook_Address_Fmt MS_Answer_Wizard_Fmt ADX_Fmt System_Deployment_Image_Fmt
Free_Lossless_Image_Fmt DPX_Fmt
Avro_Fmt InstallShield_Archive_Fmt
Mac_Executable_Fmt
GDSII_Fmt ActiveMime_Fmt
SmartCharts_Fmt Webex_ARF_Fmt
Webex_WRF_Fmt

Number 971 972 973
974 975
976 977
978
979 980 981 982 983 984
985 986
987 988
989
990 991
992 993
994

Category 911 912 913
914 915
916 917
918
919 920 921 922 923 924
925 926
927 928
929
930 931
932 933
934

Description

MIME Type

AIX/RISC COFF executable

application/x-executable-file

AIX/RISC COFF dynamic library

application/x-library-file

HPUX/PA-RISC COFF relocatable application/x-object-file object

HPUX/PA-RISC COFF executable application/x-executable-file

HPUX/PA-RISC COFF dynamic library

application/x-library-file

EBCDIC-encoded XML file

application/xml

MPEG JVT-NAL sequence H264 video

video/h264

Material Exchange Format audiovideo container format

application/mxf

Microsoft Agent Character file

Quicken data file

Microsoft Outlook address file

Microsoft Answer Wizard file

ADX audio file

Microsoft System Deployment Image SDI format

Free Lossless Image Format (FLIF) image/flif

Digital Picture Exchange (DPX) image format

image/dpx

Apache Avro binary format

InstallShield archive (early versions) format

Mac OS-X (Mach-O) executable format

GDSII data format

Microsoft ActiveMime (mso) documents

application/x-mso

BizInt SmartCharts data format

Webex advanced network ARF recordings

Webex local WRF recordings

IDOL KeyView (12.12)

Extension
A
SL XML 264 MXF ACS QDF WAB
ADX SDI FLIF DPX AVRO EX_
GDS, GDS2 MSO CHP, CHRR ARF WRF

File Class adEXECUTABLE adLIBRARY adOBJECTMODULE

Readers

adEXECUTABLE adLIBRARY

adWORDPROCESSOR adMOVIE

adMOVIE

adMOVIE adMISC adMISC adMISC adSOUND adMISC

adRASTERIMAGE adRASTERIMAGE

adMISC adENCAPSULATION

avrosr

adEXECUTABLE

adCAD adMISC

gdsiisr

adMISC adMOVIE

adMOVIE

Page 141 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name PGP_NetShare_Fmt
Ability_WP_OLE_Fmt Ability_SS_OLE_Fmt
InDesign_IDML_Fmt Executable_JAR_Fmt IDOL_IDX_Fmt Android_Package_Kit_Fmt Android_Binary_XML_Fmt
Java_WAR_Fmt Java_EAR_Fmt Atom_Syndication_Fmt RSS_Fmt SMIL_Fmt
XSLT_Fmt
XML_Shareable_Playlist_Fmt
FictionBook_Fmt Adobe_Premiere_Project_Fmt RDF_XML_Fmt Really_Simple_Discovery_Fmt
SBML_Fmt
SRU_Fmt
SSML_Fmt
PLS_Fmt
TEI_Fmt

Number 995
996 997
998 999 1000 1001 1002
1003 1004 1005 1006 1007
1008
1009
1010 1011 1012 1013
1014
1015
1016
1017
1018

Category 935
936 937
938 939 940 941 942
943 944 945 946 947
948
949
950 951 952 953
954
955
956
957
958

Description

MIME Type

Symantec PGP NetShare encrypted file

Ability Write later versions format

Ability Spreadsheet later versions format

Adobe InDesign IDML format

application/vnd.adobe.indesign-idml-package

Executable Java Archive (jar) file

application/java-archive

IDOL Server IDX file

Android Package Kit (APK) format application/vnd.android.package-archive

Android Binary XML (compressed by aapt) format

application/xml

Java WAR file format

Java EAR file format

Atom Syndication Format

application/atom+xml

RSS syndication XML format

application/rss+xml

Synchonized Multimedia Integration application/smil+xml Language (SMIL) XML format

Extensible Stylesheet Language Transformations (XSLT) format

application/xslt+xml

XML Shareable Playlist Format (XSPF)

application/xspf+xml

FictionBook e-book XML format

application/x-fictionbook+xml

Adobe Premiere project format

image/vnd.adobe.premiere

RDF/XML format

application/rdf+xml

Really Simple Discovery (RSD) XML application/rsd+xml format

Systems Biology Markul Language application/sbml+xml (SBML) XML format

Search/Retrieve via URL (SRU) XML format

application/sru+xml

Speech Synthesis Markup Language (SSML) XML format

application/ssml+xml

Pronunciation Lexicon Specification application/pls+xml (PLS) XML format

Text Encoding Initiative (TEI) XML application/tei+xml

Extension
AWW AWS
IDML JAR IDX APK XML
WAR EAR ATOM RSS SMIL
XSL, XSLT
XSPF
FB2 PPJ RDF RSD
SBML
SRU
SSML
PLS
TEI

File Class adENCAPSULATION

Readers

adWORDPROCESSOR adSPREADSHEET

olesr

adDESKTOPPUBLSH adENCAPSULATION adENCAPSULATION adEXECUTABLE adWORDPROCESSOR

unzip

adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

xmlsr xmlsr xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR adMISC adWORDPROCESSOR adWORDPROCESSOR

xmlsr
xmlsr xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

IDOL KeyView (12.12)

Page 142 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
METS_Fmt
MODS_Fmt
Metalink_Fmt Open_eBook_Fmt SRGS_Fmt
SPARQL_Results_Fmt Adobe_XML_Data_Package_Fmt ESzigno_Fmt Mozilla_XUL_Fmt
SyncML_Fmt
VoiceXML_Fmt TI_Target_Configuration_Fmt
LZFSE_Fmt
Kindle_eBook_Fmt
Oasis_Stream_Fmt
Amazon_KFX_Fmt KTX_Fmt GMSH_Mesh_Fmt Collada_DAE_Fmt
YIN_Fmt MPEG_Playlist_Fmt Windows_Audio_Playlist_Fmt DTS_Audio_Fmt

Number
1019
1020
1021 1022 1023
1024 1025 1026 1027
1028
1029 1030
1031
1032
1033
1034 1035 1036 1037
1038 1039 1040 1041

Category
959
960
961 962 963
964 965 966 967
968
969 970
971
972
973
974 975 976 977
978 979 980 981

Description

MIME Type

format

Metadata Encoding and Transmission Standard (METS) XML format

application/mets+xml

Metadata Object Description Schema (MODS) XML format

application/mods+xml

Metalink XML format

application/metalink4+xml

Open eBook (OEBPS) XML format application/oebps-package+xml

Speech Recognition Grammar Specification (SRGS) XML format

application/srgs+xml

SPARQL Query Results XML format application/sparql-results+xml

Adobe XML Data Package format application/vnd.adobe.xdp+xml

e-Szigno signed xml document

application/vnd.eszigno3+xml

Mozilla XML User Interface Language (XUL) XML format

application/vnd.mozilla.xul+xml

Synchronization Markup Language application/vnd.syncml+xml (SyncML) XML format

VoiceXML (VXML) XML format

application/voicexml+xml

Texas Instruments CCXML target configuration XML format

Lempel-Ziv Finite State Entropy (LZFSE) compression format

Amazon Kindle or Mobipocket eBook format

application/vnd.amazon.ebook

Open Artwork System Interchange Standard (OASIS) format

Amazon KFX eBook format

KTX image format

image/ktx

GMSH Mesh polygon format

model/mesh

Collada Digital Asset Exchange (DAE) format

model/vnd.collada+xml

YIN XML format

application/yin+xml

MPEG audio playlist format

audio/mpegurl

Windows Audio playlist format

audio/x-ms-wax

DTS Coherent Acoustics audio

audio/vnd.dts

IDOL KeyView (12.12)

Extension
METS
MODS
METALINK OPF SRGS
SRX XDP ES3 XUL
XML
VXML CCXML
LZFSE
AZW, PRC
OAS
KFX KTX MSH DAE
YIN M3U WAX DTS

File Class

Readers

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

xmlsr xmlsr xmlsr

adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

xmlsr xmlsr xmlsr xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR adWORDPROCESSOR

xmlsr

adENCAPSULATION

adWORDPROCESSOR

adMISC

adWORDPROCESSOR adRASTERIMAGE adCAD adCAD

xmlsr

adWORDPROCESSOR adSOUND adSOUND adSOUND

xmlsr xmlsr

Page 143 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
Chemical_Markup_Language_Fmt
CrystalMaker_Fmt VTK_XML_Fmt
IPFIX_Fmt
Portable_Font_Resource_Fmt MARC_Fmt
MARC_XML_Fmt
XAR_Fmt Symbian_Installer_Fmt SO_Drawing_XML_Fmt
SO_Text_Global_XML_Fmt
ODF_Chart_Fmt ODF_Database_Fmt ODF_Image_Fmt ODF_Text_Master_Fmt ODF_Text_Web_Fmt ODF_Chart_Template_Fmt ODF_Formula_Template_Fmt ODF_Drawing_Template_Fmt ODF_Image_Template_Fmt ODF_Presentation_Template_Fmt
ODF_Spreadsheet_Template_Fmt
ODF_Text_Template_Fmt

Number
1042
1043 1044
1045
1046 1047
1048
1049 1050 1051
1052
1053 1054 1055 1056 1057 1058 1059 1060 1061 1062
1063
1064

Category
982
983 984
985
986 987
988
989 990 316
991
992 993 994 995 996 997 998 316 999 316
315
314

Description

MIME Type

Extension

format

Chemical Markup Language (CML) chemical/x-cml XML format

CML

CrystalMaker chemical format

chemical/x-cmdf

CMDF

Visualization Toolkit VTK XML format

model/vnd.vtu

VTU

IP Flow Information Export (IPFIX) application/ipfix format

IPFIX

Portable Font Resource font format application/font-tdpfr

PFR

Machine-Readable Cataloging (MARC21) format

application/marc

MARC

Machine-Readable Cataloging (MARC) XML format

application/marcxml+xml

XML

Extensible Archive (XAR) format

Symbian installer format

application/vnd.symbian.install

SIS

OpenDocument format (OpenOffice application/vnd.sun.xml.draw 1/StarOffice 6.7) Drawing XML

SXD

OpenDocument format (OpenOffice application/vnd.sun.xml.writer.global 1/StarOffice 6.7) Writer Master document XML

SXG

ODF Chart

application/vnd.oasis.opendocument.chart

ODC

ODF Database

application/vnd.sun.xml.base

ODB

ODF Image

application/vnd.oasis.opendocument.image

ODI

ODF Text Master

application/vnd.oasis.opendocument.text-master

ODM

ODF Text Web

application/vnd.oasis.opendocument.text-web

OTH

ODF Chart Template

application/vnd.oasis.opendocument.chart-template

OTC

ODF Formula Template

application/vnd.oasis.opendocument.formula-template OTF

ODF Drawing/Graphics Template application/vnd.oasis.opendocument.graphics-template OTG

ODF Image Template

application/vnd.oasis.opendocument.image-template OTI

ODF Presentation Template

application/vnd.oasis.opendocument.presentationtemplate

OTP

ODF Spreadsheet Template

application/vnd.oasis.opendocument.spreadsheettemplate

OTS

ODF Text Template

application/vnd.oasis.opendocument.text-template

OTT

File Class

Readers

adWORDPROCESSOR xmlsr

adSCIENTIFIC adVECTORGRAPHIC

xmlsr

adMISC

adFONT adDATABASE

adWORDPROCESSOR xmlsr

adENCAPSULATION adENCAPSULATION adVECTORGRAPHIC

kpodfrdr

adWORDPROCESSOR

adVECTORGRAPHIC adDATABASE adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC adRASTERIMAGE adPRESENTATION

odfwpsr odfwpsr
unzip kpodfrdr
kpodfrdr

adSPREADSHEET

odfsssr

adWORDPROCESSOR odfwpsr

IDOL KeyView (12.12)

Page 144 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name ODF_Chart_XML_Fmt ODF_Drawing_XML_Fmt
ODF_Formula_XML_Fmt ODF_Image_XML_Fmt ODF_Presentation_XML_Fmt ODF_Spreadsheet_XML_Fmt ODF_Text_XML_Fmt ODF_Extension_Fmt StarView_Metafile_Fmt
BBeB_LRF_eBook_Fmt
GPG_Trust_DB_Fmt VICE_Emulator_Fmt
Portable_Game_Notation_Fmt
Doom_WAD_Fmt Device_Tree_Blob_Fmt BDF_Font_Fmt PC_Screen_Font_Fmt JNLP_Fmt XAML_Browser_Application_Fmt
MS_Binder_Fmt XAP_Fmt
StuffIt_X_Fmt FIG_Fmt
XPInstall_Fmt
XDF_Fmt

Number 1065 1066
1067 1068 1069 1070 1071 1072 1073
1074
1075 1076
1077
1078 1079 1080 1081 1082 1083
1084 1085
1086 1087
1088
1089

Category 1000 1001
1002 1003 1004 1005 1006 1007 1008
1009
1010 1011
1012
1013 1014 1015 1016 1017 1018
1019 1020
1021 1022
1023
1024

Description

MIME Type

ODF Chart flat XML format

application/vnd.oasis.opendocument.chart.xml

ODF Drawing/Graphics flat XML format

application/vnd.oasis.opendocument.formula.xml

ODF Formula flat XML format

application/vnd.oasis.opendocument.graphics.xml

ODF Image flat XML format

application/vnd.oasis.opendocument.image.xml

ODF Presentation flat XML format application/vnd.oasis.opendocument.presentation.xml

ODF Spreadsheet flat XML format application/vnd.oasis.opendocument.spreadsheet.xml

ODF Text flat XML format

application/vnd.oasis.opendocument.text.xml

ODF Extension format

application/vnd.openofficeorg.extension

OpenOffice StarView MetaFile format

image/x-svm

Broad Band eBook (BBeB) in LRF format

application/x-ext-lrf

GPG trust database format

VICE (Versatile Commodore Emulator) format

Portable Game Notation chess format

application/vnd.chess-pgn

Doom IWAD/PWAD format

application/x-doom

Linux Device Tree Blob format

Glyph Bitmap Distribution Format application/x-font-bdf

PC Screen Font format

application/x-font-psf

Java Network Launching Protocol application/x-java-jnlp-file

XAML Browser Application (XBAP) application/x-ms-xbap format

Microsoft Office Binder format

application/x-msbinder

Microsoft Silverlight application (XAP) format

application/x-silverlight-app

StuffIt X (SITX) archive format

application/x-stuffitx

Facility for Interactive Generation of application/x-xfig figures (FIG) image format

XPInstall Cross-Platform Installer Module (XPI) format

application/x-xpinstall

Extensible Data Format (XDF) XML

Extension FODC FODG
FODF FODI FODP FODS FODT OXT SVM
LRF
GPG VSF
PGN
WAD DTB BDF PSF JNLP XBAP
OBP XAP
SITX FIG
XPI
XDF

File Class adVECTORGRAPHIC adWORDPROCESSOR

Readers

adVECTORGRAPHIC adRASTERIMAGE adPRESENTATION adSPREADSHEET adWORDPROCESSOR adMISC adRASTERIMAGE

adWORDPROCESSOR

adMISC adMISC

adWORDPROCESSOR

adMISC adMISC adFONT adFONT adWORDPROCESSOR adWORDPROCESSOR

xmlsr xmlsr

adENCAPSULATION adENCAPSULATION

olesr

adENCAPSULATION adVECTORGRAPHIC

adENCAPSULATION

adWORDPROCESSOR xmlsr

IDOL KeyView (12.12)

Page 145 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
MXML_Fmt
MusicXML_Fmt Finale_Fmt Spotfire_DXP_Fmt MS_Office_Theme_2007_Fmt Adobe_AIR_Installer_Fmt
Flex_Project_Fmt FoxPro_Fmt VST_Preset_Fmt
Mischief_Image_Fmt
FreeArc_Fmt Autodesk_3ds_Fmt Monkeys_Audio_Fmt CALS_Fmt Dr_Halo_PAL_Fmt DPG_Fmt JPEG_XR_Fmt
TCR_eBook_Fmt
IHEX_Fmt QCOW_Fmt VDI_Fmt OneNote_Alternate_Fmt
RMS_Protected_Fmt
Portfolio_PDF_Fmt

Number
1090
1091 1092 1093 1094 1095
1096 1097 1098
1099
1100 1101 1102 1103 1104 1105 1106
1107
1108 1109 1110 1111
1112
1113

Category
1025
1026 1027 1028 1029 1030
1031 1032 1033
1034
1035 1036 1037 1038 1039 1040 1041
1042
1043 1044 1045 1046
1047
1048

Description

MIME Type

format

MXML UI markup language XML format

MusicXML format

application/vnd.recordare.musicxml

Finale audio format

TIBCO Spotfire DXP data format

application/vnd.spotfire.dxp

Microsoft Office theme format

application/vnd.ms-officetheme

Adobe AIR application installer package

application/vnd.adobe.air-application-installerpackage+zip

Adobe Flash Flex project file format application/vnd.adobe.fxp

FoxPro compiled source format

Virtual Studio Technology (VST) preset format

Mischief vector graphics image format

FreeArc archive format

application/x-freearc

Autodesk 3ds format

application/x-3ds

Monkey's Audio format

CALS raster image format

Dr Halo raster image PAL file format

Nintendo DS DPG video format

JPEG XR (extended range) image format

image/vnd.ms-photo

TCR/ZVR (Text Compression for Reader) eBook format

Intel Hex format

QEMU Copy On Write

VirtualBox Disk Image

OneNote Alternative Packaging Format

Rights Management Services (RMS)-protected format

Portfolio PDF File

application/pdf

Extension
MXML
MXL MUS DXP THMX AIR
FXP FXP FXP
ART
ARC 3DS APE CAL PAL DPG JXR, HDP
TCR, ZVR
IHEX QCOW VDI
PFILE, PPDF, PJPG, PTXT PDF

File Class

Readers

adWORDPROCESSOR xmlsr

adENCAPSULATION adSOUND adANALYTICS adMISC adENCAPSULATION

xmlsr

adENCAPSULATION adLIBRARY adSOUND

adVECTORGRAPHIC

adENCAPSULATION adCAD adSOUND adRASTERIMAGE adRASTERIMAGE adMOVIE adRASTERIMAGE

adWORDPROCESSOR

adENCAPSULATION adENCAPSULATION adENCAPSULATION adWORDPROCESSOR

onealtsr

adWORDPROCESSOR pfilesr

adWORDPROCESSOR pdfsr

IDOL KeyView (12.12)

Page 146 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Crystal_Reports_Fmt Thumbs_db_Fmt
PagePlus_Fmt MS_Project_Exchange_Fmt MS_Management_Pack_MPX_Fmt
AutoCAD_VBA_Project_Fmt PLY_ASCII_Fmt
PLY_Binary_Fmt
JavaView_JVX_Fmt X3D_Fmt
ZBrush_Project_Fmt ZBrush_Tool_Fmt Windows_Installer_Patch_Fmt
Windows_Installer_Transform_Fmt
Lotus_Approach_Fmt Outlook_SendRcv_Settings_Fmt
MS_Publisher_Scheme_Fmt SO_Chart_Fmt SO_Database_Fmt SO_Library_Fmt PageMaker_Document_Fmt MS_DTS_Fmt
Cognos_PowerPlay_PPR_Fmt
Visual_Studio_SUO_Fmt

Number 1114 1115
1116 1117 1118
1119 1120
1121
1122 1123
1124 1125 1126
1127
1128 1129
1130 1131 1132 1133 1134 1135
1136
1137

Category 1049 1050
1051 1052 1053
1054 1055
1056
1057 1058
1059 1060 1061
1062
1063 1064
1065 1066 1067 1068 1069 1070
1071
1072

Description

MIME Type

SAP Crystal Reports format

application/x-rpt

Microsoft Windows thumbs.db format

Serif PagePlus format

Microsoft Project Exchange format

Microsoft Systems Center Operation Manager (SCOM) management pack MPX format

AutoCAD VBA project format

Polygon File Format (PLY) ASCII format

Polygon File Format (PLY) binary format

JavaView XML (JVX) format

Extensible 3d Graphics (X3D) XML model/x3d+xml format

ZBrush ZProject (ZPR) format

ZBrush ZTtool (ZTL) format

Microsoft Windows Installer Patch Package (MSP) format

Microsoft Windows Installer Transform (MST) format

Lotus Approach format

application/vnd.lotus-approach

Microsoft Outlook 2002 SendReceive Settings

Microsoft Publisher colour scheme

Star Office 4,5 Chart

application/vnd.stardivision.chart

Star Office 4,5 Database

application/vnd.stardivision.base

Star Office 4,5 Library

Adobe PageMaker document

application/pagemaker

Microsoft Data Transformation Services (DTS) package file

Cognos PowerPlay up to version 7 (PPR) format

Microsoft Visual Studio solution user

IDOL KeyView (12.12)

Extension RPT DB
PPP MPX MPX
DVB PLY
PLY
JVX X3D
ZPR ZTL MSP
MST
APR, MPR SRS
SCM SDS SDB SBL PMD DTS
PPR
SUO

File Class adANALYTICS adENCAPSULATION

Readers olesr

adDESKTOPPUBLSH adSCHEDULE adMISC

olesr xmlsr

adMISC adCAD

adCAD

adCAD adCAD

xmlsr

adCAD adCAD adENCAPSULATION

olesr

adENCAPSULATION

adDATABASE adMISC

adMISC adVECTORGRAPHIC adDATABASE adLIBRARY adDESKTOPPUBLSH adMISC

olesr olesr

adANALYTICS

adSWDEV

Page 147 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
MS_GraphEdit_Fmt ArcGIS_Graph_Fmt SID_Audio_Fmt MrSID_Fmt Cardfile_Fmt
MS_Word_Mac_4_Fmt
WordPerfect_5_Fmt WordPerfect_6_Fmt
WordPerfect_Graphics_1_Fmt
Organization_Chart_Fmt Lotus_Organizer_Fmt
MS_DBML_Fmt
XMind_Fmt MSI_Cerius_Fmt
GenBank_Fmt
GIS_World_File_Fmt

Number
1138 1139 1140 1141 1142
1143
1144 1145
1146

Category
1073 1074 1075 1076 1077
205
80 178
85

Description

MIME Type

options (suo) file

Microsoft GraphEdit File format

ArcGIS Graph format

SID Audio format

audio/prs.sid

LizardTech MrSID image format

image/x-mrsid

Microsoft Windows Cardfile address application/x-mscardfile book format

Microsoft Word for Macintosh (version 4,5)

application/msword

WordPerfect (version 5)

application/x-corel-wordperfect

Corel WordPerfect (version 6 and higher)

application/x-corel-wordperfect

WordPerfect Graphics (version 1) application/vnd.wordperfect

1147 1148

1078 1079

OrgPlus Organization Chart Lotus Organizer documents

application/orgplus application/vnd.lotus-organizer

1149 1150 1151 1152 1153

1080 1081 1082 1083 1084

Microsoft Database Markup Language XML document

XMind document

application/xmind

MSI Cerius chemical formula document

chemical/x-cerius

GenBank DNA character sequence chemical/x-genbank document

ESRI GIS World file

GIS_Projection_Metadata_Fmt PowerWorld_Binary_Fmt PowerWorld_Display_Fmt ArcXML_Fmt
GAMS_GDX_Fmt

1154 1155 1156 1157
1158

1085 1086 1087 1088
1089

ESRI Projection Metadata (PRJ) file
PowerWorld Binary (PWB) file
PowerWorld Display (PWD) file
ESRI ArcIMS project XML file (ArcXML)
General Algebraic Modeling System (GAMS) Data Exchange (GDX) format

IDOL KeyView (12.12)

Extension

File Class

Readers

GRF GRF SID SID CRD

adMISC adGIS adSOUND adRASTERIMAGE adWORDPROCESSOR

DOC

adWORDPROCESSOR mbsr

WOP, DOC WPD

adWORDPROCESSOR adWORDPROCESSOR

wosr wp6sr

WPG, QPG
OPX OR2, OR3, OR4, OR5, OR6 DBML

adRASTERIMAGE, adVECTORGRAPHIC adDATABASE adSCHEDULE
adWORDPROCESSOR

XMIND MSI

adPRESENTATION adSCIENTIFIC

GB

adSCIENTIFIC

BPW, GFW, JGW, adGIS

afsr

J2W, PGW, SDW,

TFW, WLD

PRJ

adGIS

PWB

adCAD

PWD

adCAD

AXL

adGIS

GDX

adSCIENTIFIC

Page 148 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name ArcMap_MXD_Fmt
RRDtool_Fmt
HWPX_Fmt SolidWorks_2015_Fmt

Number 1159
1160
1161 1162

Category 1090
1091
1092 1093

Description
ArcMap Map Exchange Document project (MXD)
RRDtool (Round Robin Database) data file
Hangul HWPX document
SolidWorks (2015 onwards) file

MIME Type application/hwp+zip

MS_Photo_Editor_Fmt
MS_Word_HTML_Fmt MS_Excel_HTML_Fmt Portable_FloatMap_Fmt RGBE_Fmt

1163
1164 1165 1166 1167

1094
1095 1096 1097 1098

Microsoft Photo Editor 'embedded GIF' file Microsoft Word HTML format Microsoft Excel HTML format Portable FloatMap (PFM) image Radiance RGBE (HDR) image

application/vnd.ms-photo-editor
image/x-portable-floatmap image/vnd.radiance

APNG_Fmt

1168

Enhanced_Compressed_Wavelet_Fmt 1169

Ensoniq_Waveset_Fmt Corel_Photo_Paint_Fmt

1170 1171

OpenRaster_Fmt Krita_Fmt Gerber_Fmt PGML_Fmt

1172 1173 1174 1175

Away3D_Fmt CAD_3MF_Fmt

1176 1177

AMF_Fmt

1178

C3D_Fmt CAD_3DSystems_BFF_Fmt

1179 1180

NRRD_Fmt

1181

1099
1100
1101 1102
1103 1104 1105 1106
1107 1108
1109
1110 1111
1112

Animated Portable Network Graphics (Animated-PNG)

image/apng

Enhanced Compressed Wavelet image

image/ecw

Ensoniq Waveset audio data file

Corel Photo Paint (version 7 and higher)

image/x-corelphotopaint

OpenRaster image

image/openraster

Krita image

application/x-krita

Gerber image format

application/vnd.gerber

Precision Graphics Markup Language

Away3D scene file

3D Manufacturing Format document application/vnd.ms-package.3dmanufacturing3dmodel+xml

Additive manufacturing file format (AMF) document

application/x-amf

Coordinate 3D (C3D) format

3D Sprint (3D Systems) SLA Build file

NRRD (nearly raw raster data)

Extension MXD

File Class adGIS

Readers

RRD

adDATABASE

HWPX

adWORDPROCESSOR

SLDPRT, SLDDRW, adCAD SLDASM

adRASTERIMAGE

hwpxsr

DOC, HTM XLS, HTM PFM HDR, PIC, RGBE, XYZE APNG, PNG

adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adRASTERIMAGE

htmlsr htmlsr

adANIMATION

kppngrdr

ECW

adRASTERIMAGE

ECW CPT

adSOUND adRASTERIMAGE

ORA KRA GBR PGML

adRASTERIMAGE adRASTERIMAGE adVECTORGRAPHIC adVECTORGRAPHIC

xmlsr

AWD 3MF

adCAD adCAD

AMF

adCAD

xmlsr

C3D BFF

adCAD adCAD

NRRD

adRASTERIMAGE

IDOL KeyView (12.12)

Page 149 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
Cinema_4D_Fmt FBX_ASCII_Fmt FBX_Binary_Fmt Wavefront_OBJ_Fmt
Wavefront_MTL_Fmt
MS_Power_BI_Template_Fmt
Windows_Sticky_Notes_Fmt
BlakHole_Fmt PowerArchiver_Fmt
PageMagic_Fmt PIM_Archiver_Fmt Softdisk_Text_Compressor_Fmt Ability_PhotoPaint_Fmt Softlib_Fmt Timeworks_Publisher_Fmt
Scribe_Fmt
SQLite_Write_Ahead_Log_Fmt SQLite_WAL_Index_Fmt AutoForm_Design_Fmt TSV_Fmt OpenStreetMap_XML_Fmt OpenStreetMap_PBF_Fmt
Nero_Audio_Compilation_Fmt Nero_ISO_Compilation_Fmt

Number
1182 1183 1184 1185
1186
1187
1188
1189 1190
1191 1192 1193 1194 1195 1196
1197
1198 1199 1200 1201 1202 1203
1204 1205

Category
1113 1114 1115 1116
1117
1118
1119
1120 1121
1122 1123 1124 1125 1126 1127
1128
1129 1130 1131 1132 1133 1134
1135 1136

Description

MIME Type

image format

Cinema 4D model

Kaydara FBX project (ASCII)

Kaydara FBX project (binary)

Wavefront OBJ geometry definition file

Wavefront Material Template Library (MTL)

Microsoft Power BI Desktop template format

Microsoft Windows Sticky Notes format

BlakHole compression format

PowerArchiver PA compression format

NEBS PageMagic format

PIM Archiver format

Softdisk Text Compressor format

Ability Office PhotoPaint image

Softdisk Softlib compression format

Timeworks Publisher (Publish It) format

Scribe markup language and word processing system

SQLite Write-Ahead Log file

SQLite WAL-index (shm) file

AutoForm Design file

Tab-separated values (TSV) file

text/tab-separated-values

OpenStreetMap XML data

OpenStreetMap Protocolbuffer Binary Format data file (.osm.pbf)

Nero Audio-CD compilation file

Nero ISO compilation file

IDOL KeyView (12.12)

Extension
C4D FBX FBX OBJ
MTL
PBIT
SNT
BH PA
DTP PIM CTX APX SLB DTP
MSS
WAL SHM AFD TSV, TAB OSM PBF
NRA NRI

File Class

Readers

adCAD adCAD adCAD adCAD

adCAD

adANALYTICS

adWORDPROCESSOR

adENCAPSULATION adENCAPSULATION

adDESKTOPPUBLSH adENCAPSULATION adENCAPSULATION adRASTERIMAGE adENCAPSULATION adDESKTOPPUBLSH

olesr

adWORDPROCESSOR afsr

adDATABASE adDATABASE adCAD adWORDPROCESSOR adGIS adGIS

afsr, afsr

adMISC adMISC

Page 150 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name WordStar_for_Windows_Fmt MS_Outlook_PAB_Fmt HLSL_FXO_Fmt
HLSL_CSO_Fmt
Oberon_Document_Fmt Oberon_Symbol_Fmt Oberon_Code_Fmt Python_Bytecode_Fmt PCPaint_Fmt PCRaster_Map_Fmt COM_Type_Library_Fmt MS_Visual_C_Export_Fmt Lotus_Organizer_Report_Fmt Audible_Audiobook_AA_Fmt DOS_RED_Fmt CA_ZIPXP_Fmt Kindle_Topaz_Fmt Windows_Shim_Database_Fmt MS_Incremental_Linker_Fmt Lotus_Smart_Icon_Fmt Lotus_Organizer_Layout_Fmt CMZ_Fmt

Number 1206 1207 1208
1209
1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227

Category 1137 1138 1139
1140
1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158

Description

MIME Type

WordStar for Windows file

Microsoft Outlook Personal Address Book (PAB)

DirectX High-Level Shader Language (HLSL) pre-compiled shader

DirectX High-Level Shader Language (HLSL) compiled shader object

Component Pascal / Oberon Document file

Component Pascal / Oberon Symbol file

Component Pascal / Oberon Code (executable and loadable object) file

Python compiled bytecode

application/x-bytecode.python

PCPaint / Pictor Paint image format

PCRaster Map / Cross System Format geographical data

Microsoft Component Object Model (COM) Type library

Microsoft Visual C++ Export file

Lotus Organizer report document

Audible Audiobook (AA) file

audio/audible

MS-DOS RED installer library format

CA Technologies ZIPXP compressed document

Amazon Kindle Topaz eBook

Microsoft Windows Shim Database file

Microsoft Visual Studio incremental linker file

Lotus Smart Icon image file

Lotus Organizer print/paper layout file

CMZ compression format

IDOL KeyView (12.12)

Extension WSD PAB
FXO

File Class adWORDPROCESSOR adMISC

Readers stringssr

adCAD

CSO

adCAD

ODC

adSOURCECODE

OSF

adOBJECTMODULE

OCF

adEXECUTABLE

PYC PIC MAP, CSF

adEXECUTABLE adRASTERIMAGE adGIS

TLB

adLIBRARY

EXP REP AA RED CAZ

adLIBRARY adSCHEDULE adSOUND adLIBRARY adENCAPSULATION

AZW, AZW1, TPZ SDB

adWORDPROCESSOR adDATABASE

ILK

adSWDEV

SMI

adRASTERIMAGE

PLT

adSCHEDULE

CMZ

adENCAPSULATION

Page 151 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name RFFlow_Fmt InstallShield_Script_Fmt InstallShield_Rules_Fmt Windows_FTS_Fmt
DVD_Info_Fmt Emacs_Lisp_Bytecode_Fmt
Windows_Resource_Fmt
MS_Precompiled_Header_Fmt
Borland_Turbo_Project_Fmt PS_Font_Descriptor_Fmt MySQL_Index_Fmt MS_SQL_Fmt
DNL_eBook_Fmt GD_Image_Fmt ITunes_Library_Fmt MS_SQM_Fmt
VIFF_Fmt
JBIG_Fmt CodeWarrior_Project_Fmt PaintShop_Pro_JBF_Fmt Delphi_Diagram_Portfolio_Fmt Adobe_Swatch_Exchange_Fmt ASCII_Scene_Exporter_Fmt
AVR_Fmt Winamp_AVS_Fmt

Number 1228 1229 1230 1231
1232 1233
1234
1235
1236 1237 1238 1239
1240 1241 1242 1243
1244
1245 1246 1247 1248 1249 1250
1251 1252

Category 1159 1160 1161 1162
1163 1164
1165
1166
1167 1168 1169 1170
1171 1172 1173 1174
1175
1176 1177 1178 1179 1180 1181
1182 1183

Description

MIME Type

RFFlow flowchart document

InstallShield script document

InstallShield Compiled Rules file

Microsoft Windows 95/NT help fulltext-search file

DVD Information (IFO) file

content/dvd

Byte-compiled Lisp (Emacs/XEmacs)

application/x-bytecode.elisp

Microsoft Windows binary resource file

Microsoft Visual C/C++ binary precompiled header

Borland Turbo C project file

PostScript binary Font Descriptor file

MySQL MyISAM Table index

Microsoft SQL Server primary database file

DNAML DNL eBook

GD Library image

Apple iTunes music library

Microsoft Windows Live Messenger/Mail log file

Khoros Visualization Image File Format (VIFF)

image/x-viff

JBIG (JBIG1) image

image/jbig

CodeWarrior C/C++ project

PaintShop Pro JBF image cache file image/jbf

Delphi Diagram Portfolio file

Adobe Swatch Exchange Format

Autodesk 3ds Max ASCII Scene Exporter file

AVR (Audio Visual Research) format

Winamp AVS (Advanced Visualization Studio) plug-in file

IDOL KeyView (12.12)

Extension FLO INS INX FTS
IFO ELC
RES
PCH
PRJ NTF MYI MDF
DNL GD, GD2 ITL SQM
XV, VIF, VIFF
JBG, JBIG, BIE MCP JBF DDP ASE, ASEF ASE
AVR AVS

File Class adPRESENTATION adENCAPSULATION adENCAPSULATION adDATABASE

Readers

adDATABASE adEXECUTABLE

adMISC

adSWDEV

adSWDEV adFONT adDATABASE adDATABASE

adWORDPROCESSOR adRASTERIMAGE adDATABASE adMISC

adRASTERIMAGE

adRASTERIMAGE adSWDEV adMISC adMISC adRASTERIMAGE adCAD

adSOUND adSOUND

Page 152 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name After_Effects_Project_Fmt Anfy_Applet_Generator_Fmt SmartCipher_Fmt General_Exchange_Fmt Maxis_XA_Fmt NUT_Fmt OpenMG_Audio_Fmt
TXD_Fmt
DFA_Fmt FunCom_ISS_Fmt Sony_MSV_Fmt
THP_Fmt Smush_Animation_Fmt
SIFF_Audio_Fmt SNES_SPC_Fmt Sierra_VMD_Fmt VTech_MJP_Fmt Nullsoft_Video_Fmt Shorten_Fmt Leitch_Video_Fmt
ETV_Fmt TAK_Audio_Fmt Maelstrom_ANM_Fmt SW_ANM_Fmt DeluxePaint_Animation_Fmt Crack_Art_Fmt Time_Shift_Video_Fmt

Number 1253 1254 1255 1256 1257 1258 1259
1260
1261 1262 1263
1264 1265
1266 1267 1268 1269 1270 1271 1272
1273 1274 1275 1276 1277 1278 1279

Category 1184 1185 1186 1187 1188 1189 1190
1191
1192 1193 1194
1195 1196
1197 1198 1199 1200 1201 1202 1203
1204 1205 1206 1207 1208 1209 1210

Description

MIME Type

Adobe After Effects project

Anfy (Java) Applet Generator file

SmartCipher encrypted file

General Exchange Format (GXF) application/gxf

Maxis XA audio file

NUT Open Container Format

Sony OpenMG Audio (OMA) container file

Renderware Texture Dictionary (TXD) file

DreamForge DFA FMV format

FunCom ISS audio

Sony Compressed Audio (MSV/DVF)

GameCube THP Video

LucasArts Smush SAN Animation Format

Beam Software SIFF audio file

SNES SPC700 audio file

Sierra Video and Music Data format

VTech MHP video format

Nullsoft Video format (NSV)

Shorten audio file

Leitch Exchange Format video (LXF)

ETV video file

TAK audio file

Maelstrom ANM animation

Savage Warriors ANM animation

DeluxePaint animation

Crack Art image

Time Shift Video (TSV) format

IDOL KeyView (12.12)

Extension AEP AJP
GXF XA NUT OMA, OMG
TXD
DFA ISS DVF, ICS, MSV
THP SAN, NUT
SON SPC VMD MJP NSV SHN LXF
ETV TAK ANM ANM ANM CA1, CA2, CA3 TSV

File Class adMOVIE adMISC adENCAPSULATION adMOVIE adSOUND adMOVIE adSOUND
adRASTERIMAGE
adMOVIE adSOUND adSOUND
adMOVIE adANIMATION
adSOUND adSOUND adMOVIE adMOVIE adMOVIE adSOUND adMOVIE
adMOVIE adSOUND adANIMATION adANIMATION adANIMATION adRASTERIMAGE adMOVIE

Readers

Page 153 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name XBV_Fmt HNM4_Fmt HNM6_Fmt NXV_Fmt VP5_Fmt FutureVision_FST_Fmt Electronic_Arts_Audio_Fmt YOP_Fmt Matrox_Setup_Program_Fmt
Vivado_Design_Suite_Fmt Meridian_Lossless_Packing_Fmt Electronic_Arts_SEAD_Fmt Electronic_Arts_MPC_Fmt PMP_Fmt DEGAS_Fmt
DEGAS_Compressed_Fmt
AutoCAD_Plotter_Fmt
Tiny_Stuff_Fmt
JV_Video_Fmt REDCode_Fmt SIFF_Video_Fmt VP6_Fmt MTV_Fmt RSO_Fmt Star3_Fmt DXA_Fmt MTH_Fmt

Number 1280 1281 1282 1283 1284 1285 1286 1287 1288
1289 1290 1291 1292 1293 1294
1295
1296
1297

Category 1211 1212 1213 1214 1215 1216 1217 1218 1219
1220 1221 1222 1223 1224 1225
1226
1227
1228

Description

MIME Type

XBV video

CRYO HNM4 video

CRYO HNM6 video

NXV video

On2 VP5 video

FutureVision FST video

Electronic Arts audio file

Psygnosis YOP video

Matrox Setup Program Archive MVA file

Xilinx Vivado Design Suite file

Meridian Lossless Packing Audio file

Electronic Arts SEAD audio

Electronic Arts MPC video

PMP video

DEGAS (Design & Entertainment Graphic Arts System) image

DEGAS (Design & Entertainment Graphic Arts System) compressed image

AutoCAD Plot Style and Configuration files

Tiny Stuff image

1298 1299 1300 1301 1302 1303 1304 1305 1306

1229 1230 1231 1232 1233 1234 1235 1236 1237

Bitmap Brothers JV video REDCode video format Beam Software SIFF video file On2 VP6 video Chinese MP4/MTV video Mindstorm RSO audio Creative Labs Star 3 audio Runesoft DXA video Nintendo GameCube video file

IDOL KeyView (12.12)

Extension XBV HNM HNM, HNS NXV VP5 FST STR YOP MVA
VDS MLP TGV MPC PMP PI1, PI2, PI3
PC1, PC2, PC3

File Class adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adSOUND adMOVIE adMISC
adMISC adSOUND adSOUND adMOVIE adMOVIE adRASTERIMAGE
adRASTERIMAGE

Readers

CTB, STB, PC3, PMP TNY, TN1, TN2, TN3.TN4.TN5.TN6 JV R3D VB VP6 MTV RSO ST3 DXA MTH

adCAD
adRASTERIMAGE
adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adSOUND adSOUND adMOVIE adMOVIE

Page 154 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MAD_Fmt Bink2_Fmt PVA_Fmt Interplay_ACMP_Fmt Ipix_Fmt IVR_Fmt
NuppelVideo_Fmt VFlash_PTX_Fmt PMD_Ringtone_Fmt RoQ_Fmt CRYO_APC_Fmt VGZ_Fmt Novastorm_Video_Fmt UTalk_Fmt Xbox_XMV_Fmt AbiWord_Fmt AbiWord_Template_Fmt Psion_Word_Fmt Psion_Sheet_Fmt Psion_Sketch_Fmt Psion_Record_Fmt Psion_MBM_Fmt
Psion_TextEd_Fmt Psion_AIF_Fmt
Psion_PIC_Fmt Psion_Object_Fmt Psion_Executable_Fmt Psion_Sound_Fmt

Number 1307 1308 1309 1310 1311 1312
1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328
1329 1330
1331 1332 1333 1334

Category 1238 1239 1240 1241 1242 1243
1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259
1260 1261
1262 1263 1264 1265

Description

MIME Type

Electronic Arts MAD video file

Bink Video 2 audio-video container

TechnoTrend PVA video

Interplay ACMP audio

Ipix spherical image

RealNetworks Internet Video Recording (IVR) file

NuppelVideo file

VTech V.Flash VTX image

Polyphonic Ringtone PMD audio

application/x-pmd

RoQ video

CRYO Interactive APC audio

VGZ video

Novastorm Media video file

MicroTalk/UTalk audio

Microsoft Xbox XMV video

AbiWord document

application/x-abiword

AbiWord template

Psion EPOC Word document

Psion EPOC Sheet spreadsheet

Psion EPOC Sketch image

Psion EPOC Record audio

Psion EPOC Multi-Bitmap (MBM) image

Psion EPOC TextEd file

Psion EPOC Application Information File (AIF)

Psion 3 PIC bitmap

Psion 3 OPL Object File

Psion 3 IMG/APP executable

Psion 3 Sound file

IDOL KeyView (12.12)

Extension MAD BIK, BK2 PVA
IPX IVR

File Class adMOVIE adMOVIE adMOVIE adSOUND adRASTERIMAGE adMOVIE

Readers

NUV

adMOVIE

PTX

adRASTERIMAGE

PMD

adSOUND

ROQ

adMOVIE

APC, HNM, BF, ZIK adSOUND

VGZ

adMOVIE

FA, FLM

adMOVIE

UTK

adSOUND

XMV

adMOVIE

ABW

adWORDPROCESSOR

ABT

adWORDPROCESSOR

PSI, PSITEXT

adWORDPROCESSOR

PSISHEET

adSPREADSHEET

adRASTERIMAGE

adSOUND

MBM

adRASTERIMAGE

xmlsr stringssr

adWORDPROCESSOR stringssr

AIF

adRASTERIMAGE

PIC OPA, OPO IMG, APP WVE

adRASTERIMAGE adENCAPSULATION adEXECUTABLE adSOUND

Page 155 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Psion_Database_Fmt Psion_Word_3_Fmt Psion_Sheet_3_Fmt Zoner_Draw_Fmt
Zoner_BMI_Fmt TealDoc_Fmt TealPaint_Fmt PalmDOC_Fmt QiOO_Fmt Plucker_Fmt eReader_Fmt
Quickword_Fmt Quicksheet_Fmt Quickpoint_Fmt TealMeal_Fmt zTXT_Fmt TomeRaider_Fmt TomeRaider_PDB_Fmt WordSmith_Fmt iSilo_Fmt SuperMemo_Fmt BDicty_Fmt PalmOS_Executable_Fmt PalmOS_Library_Fmt Shanda_Bambook_Fmt PMLZ_Fmt
Rocket_eBook_Fmt iBooks_Author_Fmt

Number 1335 1336 1337 1338
1339 1340 1341 1342 1343 1344 1345
1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360
1361 1362

Category 1266 1267 1268 1269
1270 1271 1272 1273 1274 1275 1276
1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291
1292 1293

Description

MIME Type

Psion EPOC Database

Psion 3 Word document

Psion 3 Sheet spreadsheet

Zoner Draw / Zoner Callisto Metafile (ZMF) version 4+

Zoner BMI image

TealDoc PalmOS eBook

TealPaint PalmOS eBook

PalmDOC / Aportis DOC eBook

application/x-aportisdoc

QiOO mobile eBook

Plucker eBook

application/prs.plucker

eReader (Palm Reader/ Peanut Reader) eBook

PalmOS Quickword document

PalmOS Quicksheet document

PalmOS Quickpoint document

TealMeal PalmOS database

zTXT eBook

application/x-pdb-ztxt-ebook

TomeRaider eBook

TomeRaider PDB eBook

PalmOS Wordsmith document

PalmOS iSilo document

application/x-pdb-isilo-ebook

PalmOS SuperMemo document

PalmOS BDicty document

PalmOS executable

application/vnd.palm

PalmOS dynamic library

Shanda Bambook eBook

application/x-snb-ebook

Palm Markup Language (PMLZ) eBook

Rocket eBook

application/x-rocketbook

Apple iBooks Author eBook

application/vnd.apple.ibauthor

IDOL KeyView (12.12)

Extension
WRD SPR ZMF
BMI PDB PDB PRC, PDB JAR PDB PDB
PRC PRC PRC PDB PDB TR TR2, TR3
PDB KNO, PDB PDB PRC PRC SNB PMLZ
RB IBA

File Class adDATABASE adWORDPROCESSOR adSPREADSHEET adVECTORGRAPHIC

Readers stringssr

adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

adWORDPROCESSOR adSPREADSHEET adPRESENTATION adDATABASE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adEXECUTABLE adLIBRARY adWORDPROCESSOR adWORDPROCESSOR

stringssr

adWORDPROCESSOR adWORDPROCESSOR

Page 156 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Statistica_Spreadsheet_Fmt Statistica_Graph_Fmt Statistica_Scrollsheet_Fmt Apple_Newton_Package_Fmt

Number 1363 1364 1365 1366

Adobe_Zip_Extension_Fmt

1367

Uniform_Office_Fmt Uniform_Office_Text_Fmt

1368 1369

Uniform_Office_Spreadsheet_Fmt Uniform_Office_Presentation_Fmt Uniform_Office_Zip_Fmt

1370 1371 1372

Uniform_Office_Text_Zip_Fmt

1373

Uniform_Office_Spreadsheet_Zip_Fmt 1374

Uniform_Office_Presentation_Zip_Fmt 1375

MacDraft_Fmt RagTime_Fmt MacDraw_Fmt Wingz_Fmt Claris_Draw_Fmt BeagleWorks_Word_Fmt

1376 1377 1378 1379 1380 1381

BeagleWorks_Database_Fmt

1382

BeagleWorks_Spreadsheet_Fmt

1383

BeagleWorks_Paint_Fmt

1384

BeagleWorks_Draw_Fmt

1385

Category 1294 1295 1296 1297
1298
1299 1300
1301 1302 1303
1304
1305
1306
1307 1308 1309 1310 1311 1312
1313
1314
1315
1316

Description

MIME Type

Statsoft Statistica Spreadsheet

Statsoft Statistica Graph File

Statsoft Statistica Scrollsheet

Apple Newton executable/installer/file

Adobe Zip Format Extension Package (ZXP)

application/vnd.adobe.air-ucf-package+zip

Uniform Office Format document

Uniform Office Format word processing document

application/vnd.uof.text

Uniform Office Format spreadsheet application/vnd.uof.spreadsheet

Uniform Office Format presentation application/vnd.uof.presentation

Uniform Office Format document, zip format

Uniform Office Format word processing document, zip format

application/vnd.uof.text+zip

Uniform Office Format spreadsheet, application/vnd.uof.spreadsheet+zip zip format

Uniform Office Format presentation, application/vnd.uof.presentation+zip zip format

MacDraft drawing

RagTime document

MacDraw drawing

Wingz spreadsheet

Claris Draw document

BeagleWorks (later WordPerfect Works) Word Processor document

BeagleWorks (later WordPerfect Works) Database document

BeagleWorks (later WordPerfect Works) Spreadsheet document

BeagleWorks (later WordPerfect Works) Paint document

BeagleWorks (later WordPerfect Works) Draw document

Extension STA STG SCR PKG ZXP UOF UOF, UOT UOF, UOS UOF, UOP UOF UOF, UOT UOF, UOS UOF, UOP DRW, MDD RAG, RTD
WKZ
BW, WPW BW, WPW BW, WPW BW, WPW BW, WPW

File Class adSPREADSHEET adVECTORGRAPHIC adSPREADSHEET adEXECUTABLE

Readers

adENCAPSULATION

adWORDPROCESSOR adWORDPROCESSOR

xmlsr xmlsr

adSPREADSHEET adPRESENTATION adWORDPROCESSOR

adWORDPROCESSOR

adSPREADSHEET

adPRESENTATION

adCAD adDESKTOPPUBLSH adVECTORGRAPHIC adSPREADSHEET adVECTORGRAPHIC adWORDPROCESSOR

stringssr

adDATABASE

adSPREADSHEET

adRASTERIMAGE

adVECTORGRAPHIC

IDOL KeyView (12.12)

Page 157 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name GreatWorks_Word_Fmt GreatWorks_Outline_Fmt GreatWorks_Database_Fmt GreatWorks_Spreadsheet_Fmt GreatWorks_Draw_Fmt GreatWorks_Chart_Fmt MS_Works_3_Mac_WP_Fmt MS_Works_3_Mac_DB_Fmt MS_Works_3_Mac_SS_Fmt MS_Works_3_Mac_Comm_Fmt MS_Works_3_Mac_Draw_Fmt SAP_VDS_Fmt ZIPVFS_Fmt Right_Hemisphere_Material_Fmt RH_Thumbnails_Fmt Westwood_Studios_Audio_Fmt Shockwave_Stream_Fmt EGG_Video_Fmt IRCAM_Fmt Sierra_Audio_Fmt TiVo_Video_Fmt OptimFROG_Fmt

Number 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407

Category 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338

Description

MIME Type

Symantec GreatWorks Word Processor document

Symantec GreatWorks Outline document

Symantec GreatWorks Database document

Symantec GreatWorks Spreadsheet document

Symantec GreatWorks Draw document

Symantec GreatWorks Chart document

Microsoft Works for Mac, version 3 application/x-msworks and 4, Word Processor document

Microsoft Works for Mac, version 3 application/x-msworks and 4, Database

Microsoft Works for Mac, version 3 application/x-msworks and 4, Spreadsheet

Microsoft Works for Mac, version 3 application/x-msworks and 4, Communications document

Microsoft Works for Mac, version 3 application/x-msworks and 4, Draw document

SAP 3d Visual Enterprise VDS document

ZIPVFS SQLite compressed read/write database

Right Hemisphere Material file

Right Hemisphere thumbnail collection file

Westwood Studios Audio file

Shockwave Stream audio-video file

EGG video file

IRCAM audio file

Sierra Entertainment audio file

TiVo video

OptimFROG audio

IDOL KeyView (12.12)

Extension
MSW, WPS WDB WKS
MSW VDS SQLITE RH, RHM $RH AUD STREAM EGG IRCAM SOL TY+ OFR, OFS

File Class adWORDPROCESSOR

Readers stringssr

adOUTLINE

adDATABASE

adSPREADSHEET

adVECTORGRAPHIC

adVECTORGRAPHIC

adWORDPROCESSOR

adDATABASE

adSPREADSHEET

adCOMMUNICATION

adVECTORGRAPHIC

adCAD

adDATABASE

adCAD adCAD

adSOUND adMOVIE adMOVIE adSOUND adSOUND adMOVIE adSOUND

Page 158 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name LPAC_Fmt
RK_Audio_Fmt
Asylum_Music_Fmt Novastorm_Audio_Fmt HHE_Fmt Portable_Voice_Fmt CNM_Video_Fmt Phantom_Cine_Fmt MPEG2_Transport_Stream_Fmt Audacity_Project_Fmt Voltage_VSF_Fmt
XLIFF_Fmt
XBRL_Fmt
AuditXPressX_Fmt Box_Note_Fmt Hikvision_DVR_Fmt Electronic_Arts_TGV_Fmt Electronic_Arts_TGQ_Fmt Reaper_Video_Fmt Lightweight_Video_Fmt Liquid_Audio_Fmt Extended_Instrument_Fmt
MAML_Fmt
MS_Chat_Character_Fmt MS_Border_Fmt MS_Binary_Log_Fmt

Number 1408
1409
1410 1411 1412 1413 1414 1415 1416 1417 1418
1419
1420
1421 1422 1423 1424 1425 1426 1427 1428 1429
1430
1431 1432 1433

Category 1339
1340
1341 1342 1343 1344 1345 1346 1347 1348 1349
1350
1351
1352 1353 1354 1355 1356 1357 1358 1359 1360
1361
1362 1363 1364

Description

MIME Type

Lossless Predictive Audio Compression file

RK Audio lossless compressed audio

Asylum Music Format

Novastorm Media audio file

HHE video

Portable Voice Format audio

Arxel CNM audio-video format

Phantom Cine video file

MPEG-2 Transport Stream video

Audacity audio project file

application/x-audacity-project

Micro Focus Voltage VSF encrypted file

XML Localization Interchange File Format (XLIFF)

application/xliff+xml

Extensible Business Reporting Language (XBRL)

AuditXPressX file

Box Note document

Hikvision DVR video

Electronic Arts TGV video

Electronic Arts TGQ video

Reaper Video

Lightweight Video Format (LVF)

Liquid Audio

eXtended Instrument generic audio tracker

Microsoft Assistance Markup Language

Microsoft Comic Chat Character

Microsoft Office Border images

Microsoft Binary Log file

IDOL KeyView (12.12)

Extension PAC
RKA
AMF SMP HHE PVF CNM CINE M2TS AUP VDF
XLF
XBRL
AXPX BOXNOTE
TGV TGQ FMV LVF LQT XI
AML
AVB BDR BLG

File Class adSOUND

Readers

adSOUND

adSOUND adSOUND adMOVIE adSOUND adMOVIE adMOVIE adMOVIE adSOUND adENCAPSULATION

adWORDPROCESSOR xmlsr

adWORDPROCESSOR xmlsr

adWORDPROCESSOR adWORDPROCESSOR adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adSOUND adSOUND

adWORDPROCESSOR xmlsr

adRASTERIMAGE adRASTERIMAGE adMISC

Page 159 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Reader_eBook_Fmt MS_Reader_Annotations_Fmt Amazon_KFX_Aux_Fmt
Amazon_KFX_Ion_Fmt
MS_DPAPI_Fmt
MS_Streets_Fmt MS_Fast_Find_Index_Fmt MS_Fresh_Paint_Fmt MS_Mathematics_Fmt MS_Instrument_Definition_Fmt
MS_Pocket_Streets_Fmt Obfuscated_OpenType_Fmt
Pfaff_PCS_Fmt Janome_JEF_Fmt Husqvarna_HUS_Fmt
Husqvarna_VIP_Fmt
Brother_PEC_Fmt Brother_PES_Fmt Viking_SHV_Fmt VP3_Fmt SEW_Fmt Data_Stitch_Tajima_Fmt
Singer_XXX_Fmt Bernina_ART_Fmt MS_Prefetch_Fmt

Number 1434 1435 1436
1437
1438
1439 1440 1441 1442 1443
1444 1445
1446 1447 1448
1449
1450 1451 1452 1453 1454 1455
1456 1457 1458

Category 1365 1366 1367
1368
1369
1370 1371 1372 1373 1374
1375 1376
1377 1378 1379
1380
1381 1382 1383 1384 1385 1386
1387 1388 1389

Description

MIME Type

Microsoft Reader eBook file

Microsoft Reader annotation file

Amazon KFX eBook auxiliary format (2015)

Amazon KFX eBook Ion format (2015)

Microsoft Data Protection API (DPAPI) data

Microsoft Streets & Trips map

Microsoft Office Fast Find Index

Microsoft Fresh Paint image

Microsoft Mathematics worksheet

Microsoft MIDI Instrument Definition File

Microsoft Pocket Streets map

Obfuscated OpenType font (ODTTF)

application/vnd.ms-package.obfuscated-opentype

Pfaff PCS embroidery image

Janome JEF embroidery format

Husqvarna Viking HUS embroidery format

Husqvarna Viking-Pfaff VIP embroidery format

Brother PEC embroidery format

Brother PEC embroidery format

Viking SHV embroidery format

VP3 embroidery format

SEW embroidery format

Data Stitch Tajima (DST) embroidery image

Singer XXX embroidery image

Bernina ART embroidery image

Microsoft Windows Prefetch (uncompressed) file

Extension LIT EBO KFX, AZW
KFX, AZW, ION
EST FFX FPPX GCW IDF
MPS ODTTF
PCS JEF HUS
VIP
PEC PES SHV VP3 SEW DST
XXX ART PF

File Class adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

Readers

adWORDPROCESSOR

adMISC

adGIS adMISC adRASTERIMAGE adSCIENTIFIC adSOUND

adGIS adFONT

adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC

adVECTORGRAPHIC

adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC

adVECTORGRAPHIC adVECTORGRAPHIC adMISC

IDOL KeyView (12.12)

Page 160 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Prefetch_Compressed_Fmt
MS_MapPoint_Fmt MS_Live_Meeting_Fmt
MS_Speech_Definitions_Fmt
MS_Speech_Data_Fmt
MS_SQL_CE_Fmt
MS_ICE_Project_Fmt
MS_DVR_Fmt
Symbol_Dynamics_EXP_Fmt
XNA_Compiled_Fmt Outlook_Shortcut_Fmt
ChiWriter_Fmt
ChiWriter4_Fmt Lightning_Strike_Fmt Blackberry_Executable_Fmt EndNote_Library_Fmt EndNote_Library_X_Fmt
EndNote_Filter_Fmt EndNote_Style_Fmt EndNote_Connection_Fmt Camtasia_Recording_Fmt Camtasia_Project_Fmt TechSmith_Project_Fmt ABIF_Fmt

Number 1459
1460 1461
1462
1463
1464
1465
1466
1467
1468 1469
1470
1471 1472 1473 1474 1475
1476 1477 1478 1479 1480 1481 1482

Category 1390
1391 1392
1393
1394
1395
1396
1397
1398
1399 1400
1401
1402 1403 1404 1405 1406
1407 1408 1409 1410 1411 1412 1413

Description

MIME Type

Microsoft Windows Prefetch (compressed) file

Microsoft MapPoint map

Microsoft Office Live Meeting Connection

Microsoft text-to-speech Speech Definitions File

Microsoft text-to-speech Speech Data File

Microsoft SQL Server Compact (CE) edition database

Microsoft Image Composite Editor (ICE) Project

Microsoft Digital Video Recording (DVR-MS)

video/x-ms-dvr

Symbol Dynamics EXP v1-4 document

Microsoft XNA Compiled Format

Microsoft Outlook or Exchange folder shortcut

ChiWriter document (up to version 3)

ChiWriter document (version 4)

Lightning Strike image

image/cis-cod

Blackberry executable

EndNote Library (up to version 9) application/x-endnote-library

EndNote Library (version X onwards)

EndNote Filter

application/x-puid-fmt-327

EndNote Style

application/x-endnote-style

EndNote Connection

application/x-endnote-connect

Camtasia Recording

Camtasia XML Project

TechSmith JSON Project

Applied Biosystems Inc. Format

IDOL KeyView (12.12)

Extension PF
PTM RTC
SDF
SPD
SDF
SPJ
DVR-MS
WXP
XNB XNK
CHI
CHI COD COD ENL ENL, ENLX
ENF ENS ENZ CAMREC CAMPROJ TSCPROJ AB1, FSA

File Class adMISC

Readers

adGIS adSCHEDULE

adMISC

adDATABASE

adDATABASE

adMISC

adMOVIE

adWORDPROCESSOR stringssr

adENCAPSULATION adMISC

adWORDPROCESSOR

adWORDPROCESSOR adRASTERIMAGE adEXECUTABLE adDATABASE adDATABASE

adDATABASE adDATABASE adDATABASE adMOVIE adWORDPROCESSOR adWORDPROCESSOR adSCIENTIFIC

Page 161 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
CIF_Fmt Sibelius_Fmt Geogebra_Worksheet_Fmt Geogebra_Tool_Fmt Polynomial_Texture_Map_Fmt Poly_Tracker_Fmt PC_Outline_Fmt Spline_Font_Database_Fmt QuickTime_Image_Fmt XBin_Image_Fmt Segmented_Hypergraphics_Fmt
LEADTools_CMP_Fmt WBMP_Fmt Blender_Fmt Blender_v1_Fmt Scribus_Fmt LyX_Fmt NZB_Fmt KWord_Fmt KSpread_Fmt KPresenter_Fmt KWord_GZ_Fmt
KSpread_GZ_Fmt
KPresenter_GZ_Fmt
Karbon_Fmt KChart_Fmt KPlato_Fmt

Number
1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493
1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504
1505
1506
1507 1508 1509

Category
1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424
1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435
1436
1437
1438 1439 1440

Description (ABIF) Crystallographic Information File Sibelius musical score Geogebra worksheet Geogebra tool Polynomial Texture Map (PTM) Poly Tracker audio PC-Outline document Spline Font Database (SFD) font QuickTime (QTIF) image XBin image MS Segmented Hypergraphics image LEADTools CMP image Wireless Bitmap image (WBMP) Blender (v2) CAD file Blender (v1) CAD file Scribus document LyX document NewzBin NZB format KOffice KWord document KOffice KSpread document KOffice KPresenter document KOffice (up to v1.1) kWord document KOffice (up to v1.1) kSpread document KOffice (up to v1.1) kPresenter document KOffice Karbon document KOffice KChart document KOffice KPlato document

MIME Type chemical/x-cif application/vnd.geogebra.file
image/x-quicktime
image/vnd.wap.wbmp application/x-blender application/x-blender application/vnd.scribus application/x-lyx application/x-nzb application/vnd.kde.kword application/vnd.kde.kspread application/vnd.kde.kpresenter application/x-kword application/x-kspread application/x-kpresenter application/vnd.kde.karbon application/vnd.kde.kchart application/x-vnd.kde.kplato

IDOL KeyView (12.12)

Extension

File Class

Readers

CIF SIB GGB GGT PTM PTM PCO SFD QTIF, QIF, QTI XB SHG
CMP WBMP BLEND BLEND SLA LYX NZB KWD KSP KPR KWD
KSP
KPR
KARBON CHRT KPLATO

adSCIENTIFIC adSOUND adSCIENTIFIC adSCIENTIFIC adRASTERIMAGE adSOUND adWORDPROCESSOR adFONT adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE

adRASTERIMAGE adRASTERIMAGE adCAD adCAD adDESKTOPPUBLSH adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adWORDPROCESSOR

lyxsr

adSPREADSHEET

adPRESENTATION

adVECTORGRAPHIC adSPREADSHEET adSCHEDULE

Page 162 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name GIMP_Pattern_Fmt GIMP_Brush_Fmt GIMP_Animated_Brush_Fmt Git_Pack_Index_Fmt Git_Index_Fmt MS_Tape_Fmt STL_Binary_Fmt
Unix_Shadow_Fmt MS_SQL_Log_Fmt DER_Certificate_Fmt EDIFACT_Fmt X12_Fmt Mathcad_Fmt Mathcad_XML_Fmt EDrawings_Fmt
First_Choice_DB_Fmt First_Choice_WP_Fmt
First_Choice_SS_Fmt Professional_Plan_Fmt PFS_Write_Fmt Symantec_QA_Fmt Bitmap_Graphics_Array_Fmt OS2_Help_Fmt Frame_Vector_Fmt RBase_2_Fmt Harvard_Graphics_Symbol2_Fmt Freelance_Graphics_Fmt Snoop_Capture_Fmt

Number 1510 1511 1512 1513 1514 1515 1516
1517 1518 1519 1520 1521 1522 1523 1524

Category 1441 1442 1443 1444 1445 1446 1447
1448 1449 1450 1451 1452 1453 1454 1455

Description

MIME Type

GIMP Pattern file

GIMP Brush file

GIMP Animated Brush file

Git Pack Index format

Git Index format

Microsoft Tape Format

3D Systems Stereolithography STL Binary Format

Unix /etc/shadow password file

Microsoft SQL Server log

DER-encoded X509 certificate

application/x-x509-user-cert

EDIFACT-encoded EDI document application/edifact

X12-encoded EDI document

application/edi-x12

Mathcad MCD document

application/vnd.mcd

Mathcad XMCD document

application/x-mathcad

eDrawings Publisher document

1525 1526
1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537

1456 1457
1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468

PFS First Choice database

database/x-firstchoice

PFS First Choice word-processing document

PFS First Choice spreadsheet

application/x-first-choice

PFS Professional Plan spreadsheet application/x-pfs-plan

PFS Professional Write document application/x-pfsprofessionalwrite

Symantec Q&A Database

OS/2 Bitmap Graphics Array

image/bga

OS/2 Help/INF document

Frame Vector Metafile

R:Base database (v2-v4)

Harvard Graphics Symbol File (v2)

Lotus Freelance Graphics image

Snoop Packet Capture file

IDOL KeyView (12.12)

Extension PAT GBR GIH IDX INDEX MTF, BAK
LDF DER, CER EDI EDI MCD XMCD EASM, EPRT, EDRW FOL DOC
SS
PFS DTF BGA, BMP, ICO HLP, INF FMV RBF SYM DRW CAP, SNOOP

File Class adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adENCAPSULATION adENCAPSULATION adENCAPSULATION adCAD

Readers

adMISC adDATABASE adENCAPSULATION adDATABASE adDATABASE adSCIENTIFIC adSCIENTIFIC adCAD

xmlsr

adDATABASE adWORDPROCESSOR

adSPREADSHEET adSPREADSHEET adWORDPROCESSOR adDATABASE adRASTERIMAGE adWORDPROCESSOR adVECTORGRAPHIC adDATABASE adVECTORGRAPHIC adRASTERIMAGE adENCAPSULATION

Page 163 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Python_Pickle_Fmt Matlab_Pcode_Fmt Rhinoceros_3D_Fmt GL_Transmission_Binary_Fmt
CAD_3DXML_Fmt CAD_3DXML_XML_Fmt Autodesk_Fusion_360_Fmt DELFTship_Fmt Autodesk_Inventor_Drawing_Fmt Autodesk_Inventor_Part_Fmt Autodesk_Inventor_Assembly_Fmt Autodesk_Revit_Fmt
FreeCAD_Fmt Solid_Edge_Part_Fmt Solid_Edge_Assembly_Fmt Solid_Edge_SheetMetal_Fmt SolidWorks_Visualize_Project_Fmt Apache_Parquet_Fmt AES_Crypt_Fmt SO_Math_XML_Fmt
MathML_Fmt Photoshop_Brush_Fmt Photoshop_Color_Book_Fmt Premiere_Project_Fmt
Premiere_Title_Fmt Premiere_Pro_Title_Fmt Memgraph_Fmt Memgraph_XML_Fmt

Number 1538 1539 1540 1541
1542 1543 1544 1545 1546 1547 1548 1549

Category 1469 1470 1471 1472
1473 1474 1475 1476 1477 1478 1479 1480

Description Python Pickle file Matlab P-code file Rhinoceros 3D Model Graphics Language (GL) Binary Transmission Format 3DVIDIA 3DXML archive 3DVIDIA 3DXML XML document Autodesk Fusion 360 model DELFTship or FREE!ship model Autodesk Inventor drawing Autodesk Inventor part Autodesk Inventor assembly Autodesk Revit document

MIME Type
model/gltf+binary application/x-3dxmlplugin

1550 1551 1552 1553 1554 1555 1556 1557
1558 1559 1560 1561
1562 1563 1564 1565

1481 1482 1483 1484 1485 1486 1487 1488
1489 1490 1491 1492
1493 1494 1495 1496

FreeCAD document

Solid Edge part

Solid Edge assembly

Solid Edge sheet metal

SolidWorks Visualize project

Apache Parquet document

AES Crypt document

OpenDocument format (OpenOffice application-vnd.sun.xml.math 1/StarOffice 6,7) Math XML

MathML document

application/mathml+xml

Adobe Photoshop Brush document image/x-adobe-photoshop-brush

Adobe Photoshop Color Book

Adobe Premiere Elements/Pro project

Adobe Premiere title document

Adobe Premiere Pro title document

Memgraph database plist format

application/x-bplist-memgraph

Memgraph database XML format

IDOL KeyView (12.12)

Extension PICKLE, PKL, P P 3DM GLB
3DXML 3DXML F3D FBM IDW IPT IAM RVT, RFA, RTE, RFT FCSTD PAR ASM PSM SVPJ PARQUET AES SXM
MML, MATHML ABR ACB PRPROJ, PREL
PTL PRTL MEMGRAPH MEMGRAPH

File Class adEXECUTABLE adSOURCECODE adCAD adCAD
adCAD adCAD adCAD adCAD adCAD adCAD adCAD adCAD
adCAD adCAD adCAD adCAD adCAD adDATABASE adENCAPSULATION adMISC
adMISC adMISC adMISC adMISC
adMISC adMISC adDATABASE adDATABASE

Readers
olesr olesr olesr parquetsr

Page 164 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name AV1_Image_Fmt AV1_Image_Sequence_Fmt
IVF_Fmt AV1_Image_IVF_Fmt VP8_IVF_Fmt HPROF_Fmt XLIFF_Compressed_Fmt
Scenarist_Caption_Fmt SubRip_Text_Fmt
EBU_Subtitling_Fmt
Apache_ORC_Fmt
NES_Sound_Fmt IW13_IWA_Fmt BioRad_Image_Fmt NIfTI_Fmt MRC_DV_Fmt
MRC_CCP4_Fmt ECAT_PET_Fmt OME_XML_Fmt
Panasonic_RAW_Fmt
Panasonic_RW2_Fmt FujiFilm_RAF_Fmt Olympus_ORF_Fmt HEVC_Fmt
PAM_Fmt

Number 1566 1567
1568 1569 1570 1571 1572
1573 1574
1575
1576
1577 1578 1579 1580 1581
1582 1583 1584
1585
1586 1587 1588 1589
1590

Category 1497 1498
1499 1500 1501 1502 1503
1504 1505
1506
1507
1508 1509 1510 1511 1512
1513 1514 1515
1516
1517 1518 1519 1520
1521

Description

MIME Type

AV1 Image Format (AVIF)

image/avif

AV1 Image Sequence Format (AVIFS)

image/avif-sequence

IVF container document

AV1 Image (IVF container)

image/avif

VP8 Video (IVF container)

HPROF Java Profiler document

application/vnd.java.hprof

XML Localization Interchange File Format compressed (XLIFF)

application/xliff+zip

Scenarist Closed Caption document

SubRip Text (STT) subtitles document

EBU Subtitling data exchange format

Apache ORC (Optimized Row Columnar) data

NES Sound File

Apple iWork 2013 IWA document

BioRad confocal image

NIfTI (NII) neuroimaging document

MRC Deltavision (DV) / Priism image

MRC CCP4 2014 image

ECAT medical PET image

Open Microscopy Environment (OME) XML document

Panasonic RAW or Leica RWL image

image/x-panasonic-raw

Panasonic RW2 image

image/x-panasonic-rw2

FujiFilm RAF image

image/x-fuji-raf

Olympus ORF image

image/x-olympus-orf

High Efficiency Video Coding (HEVC) MP4 document

video/h265

Portable Arbitrary Map (PAM) image image/x-portable-arbitrarymap

IDOL KeyView (12.12)

Extension AVIF AVIFS
IVF AVIF, AVIFS VP8 HPROF XLZ
SCC SRT
STL
ORC
NSF IWA PIC NII DV
MRC V XML
RAW, RWL
RW2 RAF ORF HEVC, H265
PAM

File Class adRASTERIMAGE adANIMATION

Readers

adRASTERIMAGE adRASTERIMAGE adMOVIE adMISC adWORDPROCESSOR

adWORDPROCESSOR adWORDPROCESSOR

adWORDPROCESSOR

adDATABASE

orcsr

adSOUND adMISC adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC

adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC

adRASTERIMAGE

adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adMOVIE

adRASTERIMAGE

Page 165 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Paris_Audio_Fmt Calendar_Creator_Fmt
IWork_2013_Protected_Fmt
Corel_Wavelet_WVL_Fmt Corel_Wavelet_WI_Fmt Corel_Painter_RIF_Fmt OmniPage_MET_Fmt OmniPage_OPD_Fmt GPS_Exchange_Fmt GL_Transmission_Fmt CorelChart_Fmt LocoScript_PCW_Fmt
LocoScript_DOS_Fmt IWork_2005_Protected_Fmt
JAR_Pack_Fmt
GDIFF_Fmt AFP_Fmt
NSIF_Fmt
XSL_FO_Fmt Consolidated_CDA_Fmt WebAssembly_Binary_Fmt Visual_Studio_SDF_Fmt
MS_Pocket_Word_PocketPC_Fmt
PEA_Fmt

Number 1591 1592
1593
1594 1595 1596 1597 1598 1599 1600 1601 1602
1603 1604
1605
1606 1607
1608
1609 1610 1611 1612
1613
1614

Category 1522 1523
1524
1525 1526 1527 1528 1529 1530 1531 1532 1533
1534 1535
1536
1537 1538
1539
1540 1541 1542 1543
1544
1545

Description

MIME Type

Paris Audio Format

Broderbund Calendar Creator document (v4+)

iWork 2013 password-protected document

Corel Wavelet WVL image

Corel Wavelet WI image

Corel Painter RIFF image

Caere OmniPage MET document

Caere OmniPage OPD document

GPS Exchange Format

application/gpx+xml

GL Transmission Text Format

model/gltf+json

CorelChart document

LocoScript document for Amstrad PCW

LocoScript document for MS-DOS

iWork 2005-2009 passwordprotected document

Java Archive compressed with pack200

application/x-java-pack200

GDIFF (Generic Diff) document

application/gdiff

IBM Advanced Function Presentation (AFP) image

application/vnd.ibm.modcap

NATO Secondary Image Format (NSIF) image

XSL Formatting Object (XSL-FO)

Consolidated CDA document

WebAssembly (WASM) binary-code application/wasm

Microsoft Visual Studio browsing database (sdf) file

Microsoft Pocket Word for Pocket PC

PEA (Pack, Encrypt, Authenticate) archive

IDOL KeyView (12.12)

Extension

File Class

FAP, PAF

adSOUND

CC3, CE3, CC5, BCC

adSCHEDULE

PAGES, NUMBERS, adWORDPROCESSOR KEY

WVL

adRASTERIMAGE

WI

adRASTERIMAGE

RIF

adRASTERIMAGE

MET

adMISC

OPD

adMISC

GPX

adGIS

GLTF

adCAD

CCH

adVECTORGRAPHIC

adWORDPROCESSOR

Readers

adWORDPROCESSOR

PAGES, NUMBERS, adWORDPROCESSOR KEY

PACK

adENCAPSULATION

adMISC

AFP

adRASTERIMAGE

NSF

adRASTERIMAGE

FO, XSLFO XML WASM SDF

adWORDPROCESSOR adWORDPROCESSOR adEXECUTABLE adSWDEV

xmlsr xmlsr

PSW, PWI

adWORDPROCESSOR

PEA

adENCAPSULATION

Page 166 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name MS_Pocket_Excel_PocketPC_Fmt
TTML_Fmt
Visual_SourceSafe_SCC_Fmt
NetBeans_Profiler_Fmt Mac_Alias_Fmt Firebird_DB_Fmt InterBase_DB_Fmt LZip_Fmt UltraCompressor_Fmt PostgreSQL_Filenode_Fmt
Zebra_Metafile_Fmt Kodak_Cineon_Fmt Apple_Image2_Fmt Apple_Image3_Fmt Apple_Image4_Fmt Apple_EFI_Image_Fmt Secure_Capsule_Fmt
Compact_Font_Fmt QML_Cached_Fmt KV_Mail_Subfile_Fmt
JSON_Fmt DesignPro_Fmt Edraw_Max_Fmt ActivInspire_Fmt ActivStudio_Fmt

Number 1615
1616
1617
1618 1619 1620 1621 1622 1623 1624
1625 1626 1627 1628 1629 1630 1631
1632 1633 1634
1635 1636 1637 1638 1639

Category 1546
1547
1567
1548 1549 1550 1551 1552 1553 1554
1555 1556 1557 1558 1559 1560 1561
1562 1563 1564
1565 1566 1568 1569 1570

Description

MIME Type

Microsoft Pocket Excel for Pocket PC

Timed Text Markup Language (TTML) document

Microsoft Visual SourceSafe SCC (Source Code Control) file

Java NetBeans Profiler snapshot

Mac OS alias file

Firebird database

InterBase database

lzip compressed archive

application/lzip

UltraCompressor II archive

PostgreSQL mapped relation file (pg_filenode.map)

Zoner Zebra Metafile image

Kodak Cineon image

Apple iOS Image2 document

Apple iOS Image3 document

Apple iOS Image4 document

Apple EFI Image

MacOS Secure Capsule firmware update

Adobe Compact Font Format (CFF) application/font-cff

QML Cached document

Internal mail file produced by KeyView extraction from a mail container format

JSON document

application/json

Avery DesignPro document

Edraw Max document

ActivInspire flipchart document

ActivStudio and ActivPrimary document

IDOL KeyView (12.12)

Extension PXL
TTML
SCC
NPS
FDB GDB LZ UC2 MAP
ZBR CIN IMG2 IMG3 IMG4, IM4M EFIRES SCAP
CFF QMLC MAIL
JSON ZDL, ZDP EDDX FLIPCHART FLP

File Class adSPREADSHEET

Readers

adWORDPROCESSOR

adMISC

adSWDEV adMISC adDATABASE adDATABASE adENCAPSULATION adENCAPSULATION adDATABASE

adVECTORGRAPHIC adRASTERIMAGE adOS adOS adOS adOS adOS

adFONT adSWDEV adWORDPROCESSOR afsr

adWORDPROCESSOR adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION

Page 167 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Gravit_Designer_Fmt SANM_Fmt ICEDraw_Fmt MS_Equation_Fmt Affinity_Fmt
IOS_App_Store_Package_Fmt Minitab_Worksheet_Fmt Minitab_Worksheet_12_Fmt Minitab_Worksheet_14_Fmt Minitab_Worksheet_19_Fmt Minitab_Project_Fmt Minitab_Project_19_Fmt NIST_ITL_Fmt Silo_SIA_Fmt Silo_SIB_Fmt XCBF_Fmt Zoner_Draw_OLE_Fmt
Zoner_Photo_Studio_Fmt Calligra_Plan_Fmt Symbol_Dynamics_EXP5_Fmt
REX2_Fmt WPS_Office_WP_Fmt WPS_Office_PG_Fmt WPS_Office_SS_Fmt MS_InfoPath_Fmt MS_InfoPath_XSF_Fmt PerfectWorks_Fmt CAJ_Fmt

Number 1640 1641 1642 1643 1644

Category 1571 1572 1573 1574 1575

Description

MIME Type

Gravit Designer document

LucasArts Smush SANM animation

iCEDraw character graphics image

Microsoft Equation Editor object

Affinity Photo/Publisher/Designer document

1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656
1657 1658 1659
1660 1661 1662 1663 1664 1665 1666 1667

1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587
1588 1589 1590
1591 1592 1593 1594 1595 1596 1597 1598

iOS App Store Package

Minitab worksheet v5-6

Minitab worksheet v12-13

Minitab worksheet v14-18

Minitab worksheet v19-

Minitab project up to v18

Minitab project v19-

NIST-ITL standard data

Nevercenter Silo 3D ASCII model

Nevercenter Silo 3D binary model

XML Common Biometric Format

Zoner Draw / Zoner Callisto Metafile (ZMF) version 2-3

Zoner Photo Studio document

Calligra Plan document

application/x-vnd.kde.plan

Symbol Dynamics EXP v5+ document

REX2 audio file

Kingsoft WPS Office Writer

application/wps-office.wps

Kingsoft WPS Office Presentation application/wps-office.dps

Kingsoft WPS Office Spreadsheet application/wps-office.et

Microsoft InfoPath document

Microsoft InfoPath form definition

Novell PerfectWorks document

Chinese Academic Journal CAJ

IDOL KeyView (12.12)

Extension

File Class

GVDESIGN

adVECTORGRAPHIC

SNM, ZNM

adANIMATION

IDF

adRASTERIMAGE

adWORDPROCESSOR

AFPHOTO, AFPUB, adRASTERIMAGE AFDESIGN, AFTEMPLATE

IPA

adENCAPSULATION

MTW

adSCIENTIFIC

MTW

adSCIENTIFIC

MTW

adSCIENTIFIC

MWX

adSCIENTIFIC

MPJ

adSCIENTIFIC

MPX

adSCIENTIFIC

XML

adSCIENTIFIC

SIA

adCAD

SIB

adCAD

XML

adSCIENTIFIC

ZMF

adVECTORGRAPHIC

Readers

ZPS PLAN WXP

adRASTERIMAGE adSCHEDULE adWORDPROCESSOR

stringssr

RX2 WPS, DOC DPS, PPT ET, XLS XSN XSF WPW CAJ

adSOUND adWORDPROCESSOR adPRESENTATION adSPREADSHEET adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR

mw8sr kpp97rdr xlssr

Page 168 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name
CAJ2_Fmt
KDH_Fmt
MS_DLL_Fmt
Hancom_Cell_2010_Fmt ESRI_Layer_Fmt JPEG_XL_Fmt NES_ROM_Fmt
Base64_ASCII_Fmt InDesign1_Fmt HP_PCL_XL_Fmt
SubStation_Alpha_Fmt SAMI_Fmt
Advanced_Authoring_Fmt
MF_COBOL_Library_Fmt MF_COBOL_Intermediate_Fmt
MF_COBOL_Generated_Fmt
Autodesk_EAGLE_Fmt Autodesk_EAGLE_XML_Fmt Omnis_Studio_Fmt Seclore_Fmt
Acorn_Draw_Fmt Hadoop_Sequence_File_Fmt Archicad_GSM_Fmt

Number
1668
1669
1670
1671 1672 1673 1674
1675 1676 1677
1678 1679
1680
1681 1682
1683
1684 1685 1686 1687
1688 1689 1690

Category
1599
1600
1601
1602 1603 1604 1605
1606 1607 1608
1609 1610
1611
1612 1613
1614
1615 1616 1617 1618
1619 1620 1621

Description

MIME Type

document (2010-)

Chinese Academic Journal CAJ document (2005-2010)

Chinese Academic Journal KDH document (2000-2005)

Microsoft Dynamic Link Library (DLL)

Hancom Office Cell 2010 document

ESRI Layer file

application/x-esri-layer

JPEG XL image

image/jxl

Nintendo Entertainment System (NES) ROM

application/x-nesrom

Base64-encoded ASCII text file

Adobe InDesign v1 document

application/x-indesign

HP Printer Control Language XL (PCL XL)

application/vnd.hp-pclxl

SubStation Alpha subtitle document

Synchronized Accessible Media Interchange (SAMI) subtitle document

Advanced Authoring Format (AAF) for data interchange

Micro Focus COBOL library

Micro Focus Net Express intermediate file

Micro Focus COBOL generated code file

Autodesk EAGLE library

Autodesk EAGLE XML library

Omnis Studio file

a Seclore-encrypted document whose format cannot be determined

Acorn Draw image

Apache Hadoop sequence file

Archicad library part (GSM) file

model/vnd.gdl

IDOL KeyView (12.12)

Extension
CAJ KDH, CAJ DLL, PYD CELL LYR JXL NES
INDD PXL, PRN SSA, ASS SMI, SAMI

File Class

Readers

adWORDPROCESSOR

adWORDPROCESSOR

adLIBRARY

adSPREADSHEET adGIS adRASTERIMAGE adMISC

adENCAPSULATION adDESKTOPPUBLSH adVECTORGRAPHIC

pxlsr

adWORDPROCESSOR adWORDPROCESSOR

AAF

adMOVIE

LBR

adLIBRARY

INT

adLIBRARY

GNT

adLIBRARY

LBR LBR DF1, LBR, LBS

adCAD adCAD adDATABASE adENCAPSULATION

SEQUENCEFILE GSM

adVECTORGRAPHIC adDATABASE adCAD

Page 169 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Format Name Autodesk_Point_Cloud_Fmt Autodesk_ReCap_Scan_Fmt Autodesk_ReCap_Project_Fmt BRL_CAD_Binary_Fmt Cartesian_Perceptual_Compression_ Fmt Clarion_Database_Fmt ColoRIX_Fmt Compressed_ISO_Fmt Corel_RAVE_Fmt Clicker_eBook_Fmt

Number 1691 1692 1693 1694 1695
1696 1697 1698 1699 1700

Category 1622 1623 1624 1625 1626
1627 1628 1629 1630 1631

Description

MIME Type

Autodesk Indexed Point Cloud

Autodesk ReCap Scan

Autodesk ReCap Project

BRL-CAD binary database (v5)

Cartesian Perceptual Compression image/cpi image

Clarion database

ColoRIX image

Compressed ISO CD image (CISO) application/x-compressed-iso

Corel R.A.V.E. animation

Crick Clicker eBook

Extension PCG RCS RCP G CPC, CPI
DAT RIX, SCX, SCI CSO CLK CLK

File Class adCAD adCAD adCAD adCAD adRASTERIMAGE

Readers

adDATABASE adRASTERIMAGE adENCAPSULATION adANIMATION adWORDPROCESSOR

1MHT, EML, and MBX files might return either format 2, 233, or 395, depending on the text in the file. In general, files that contain fields such as To, From, Date, or Subject are considered to be email messages; files that contain fields such as content-type and mime-version are considered to be MHT files; and files that do not contain any of those fields are considered to be text files. 2All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.
3This format is returned only if you enable source code identification. See Source Code Identification, on page 84.
4This format is returned only if you enable extended source code identification. See Source Code Identification, on page 84.

IDOL KeyView (12.12)

Page 170 of 280

File Classes

Attribute Number Description

File class

0

No file class

AutoDetNoFormat

01

Word processor

adWORDPROCESSOR

02

Spreadsheet

adSPREADSHEET

03

Database

adDATABASE

04

Raster image

adRASTERIMAGE

05

Vector graphic

adVECTORGRAPHIC

06

Presentation

adPRESENTATION

07

Executable

adEXECUTABLE

08

Encapsulation

adENCAPSULATION

09

Sound

adSOUND

10

Desktop publishing

adDESKTOPPUBLSH

11

Outline/planning

adOUTLINE

12

Miscellaneous

adMISC

13

Mixed format

adMIXED

14

Font

adFONT

15

Time scheduling

adSCHEDULE

16

Communications

adCOMMUNICATION

17

Object module

adOBJECTMODULE

18

Library module

adLIBRARY

19

Fax

adFAXFORMAT

20

Movie

adMOVIE

21

Animation

adANIMATION

22

Source Code

adSOURCECODE

23

Computer-Aided Design adCAD

IDOL KeyView (12.12)

Page 171 of 280

Filter SDK Java Programming Guide Appendix A: Supported Formats

Attribute Number Description

File class

24

BI and analysis tools adANALYTICS

25

Scientific data

adSCIENTIFIC

26

Geographic Info System adGIS

27

Software Development adSWDEV

28

Operating System

adOS

IDOL KeyView (12.12)

Page 172 of 280

Appendix B: Document Readers

This section lists the KeyView document readers that are available to filter, export, and view supported file formats.

· Key to Document Readers Table

173

· Document Readers

175

Key to Document Readers Table

The document readers table includes the following information.

Column

Description

Reader Description Filter Export View

The name of the reader. A description of the reader. Shows whether KeyView can filter text from the main content of the file. Shows whether KeyView supports export to HTML, XML, and PDF. Shows whether KeyView provides viewing capability.

Extract Metadata
Charset

Shows whether KeyView can extract sub-files.
Shows whether KeyView can extract metadata (properties such as title, author, and subject).
Shows whether KeyView can detect and extract the character set. Even though a file format might be able to provide character set information, some documents might not contain character set information. Therefore, the document reader would not be able to determine the character set of the document.

H/F
Associated File Formats

Shows whether KeyView can extract headers and footers. The file formats that are supported by the reader.

Key to Symbols

Symbol Description

Y

The feature is supported.

IDOL KeyView (12.12)

Page 173 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Key to Symbols, continued

Symbol Description

N

The feature is not supported.

P

Partial metadata is extracted from this format. Some non-standard fields are not

extracted.

T

Only text is extracted from this format. Formatting information is not extracted.

M

Only metadata (title, subject, author, and so on) is extracted from this format. Text and

formatting information are not extracted.

IDOL KeyView (12.12)

Page 174 of 280

Document Readers

Reader ActiveX components ad1sr afmsr afsr

Description
Microsoft Visio (2013)

Filter Export View Extract Metadata Charset H/F Associated File Formats

N

N

Y1

N

Y

N

N

MS_Visio_2013_Fmt

AD1 Evidence file N

N

Y

Y

N

n/a

N

AD1_Fmt

Adobe Font Metrics Y

T

T

N

N

N

N

Adobe_Font_Metrics_Fmt

ASCII Text

Y

Y

Y

N

N

N

N

ABAP_Fmt, AMPL_Fmt, APL_Fmt, ASCII_

Text_Fmt, ASN1_Fmt, ATS_Fmt, Agda_

Fmt, Alloy_Fmt, Apex_Fmt, AppleScript_

Fmt, Arduino_Fmt, AsciiDoc_Fmt, AspectJ_

Fmt, Assembly_Fmt, Awk_Fmt, BlitzMax_

Fmt, Bluespec_Fmt, Brainfuck_Fmt,

Brightscript_Fmt, CLIPS_Fmt, CMake_Fmt,

COBOL_Fmt, CPlusPlus_Fmt, CWeb_Fmt,

C_Fmt, CartoCSS_Fmt, Ceylon_Fmt,

Chapel_Fmt, Clarion_Fmt, Clean_Fmt,

Clojure_Fmt, CoffeeScript_Fmt,

Component_Pascal_Fmt, Cool_Fmt, Coq_

Fmt, Creole_Fmt, Crystal_Fmt, Csharp_

Fmt, Csound_Document_Fmt, Csound_

1Visio 2013 is supported in Viewing only, with the support of ActiveX components from the Microsoft Visio 2013 Viewer. Image fidelity is supported but other features, such as highlighting, are not.

IDOL KeyView (12.12)

Page 175 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats
Fmt, Css_Fmt, Cuda_Fmt, DIGITAL_ Command_Language_Fmt, DTrace_Fmt, D_Fmt, Dart_Fmt, Dockerfile_Fmt, ECL_ Fmt, E_Fmt, Eiffel_Fmt, Elm_Fmt, Emacs_ Lisp_Fmt, EmberScript_Fmt, Erlang_Fmt, Fantom_Fmt, Forth_Fmt, Fortran_Fmt, FreeMarker_Fmt, Frege_Fmt, Fsharp_Fmt, GAMS_Fmt, GAP_Fmt, GDScript_Fmt, GIS_World_File_Fmt, GLSL_Fmt, G_code_ Fmt, Game_Maker_Language_Fmt, Gnuplot_Fmt, Go_Fmt, Golo_Fmt, Gosu_ Fmt, Gradle_Fmt, GraphQL_Fmt, Graphviz_DOT_Fmt, Groovy_Fmt, HLSL_ Fmt, Hack_Fmt, Haml_Fmt, Handlebars_ Fmt, Haskell_Fmt, Hy_Fmt, IDL_Fmt, IGOR_Pro_Fmt, Idris_Fmt, Inform_7_Fmt, Ini_Fmt, Ioke_Fmt, Isabelle_Fmt, JSONiq_ Fmt, JSX_Fmt, J_Fmt, Jasmin_Fmt, Java_ Fmt, Javascript_Fmt, Jolie_Fmt, Julia_Fmt, KV_Mail_Subfile_Fmt, KiCad_Layout_Fmt, KiCad_Schematic_Fmt, Kotlin_Fmt, LFE_ Fmt, LOLCODE_Fmt, Lasso_Fmt, Limbo_ Fmt, Lisp_Fmt, LiveScript_Fmt, Lua_Fmt, MAXScript_Fmt, ML_Fmt, MSDOS_Batch_ File_Fmt, M_Fmt, Makefile_Fmt, Markdown_Fmt, Mathematica_Fmt, Matlab_Fmt, Max_Code_Fmt, Mercury_ Fmt, Modelica_Fmt, Modula_2_Fmt, Monkey_Fmt, Moocode_Fmt, NL_Fmt, NSIS_Fmt, NetLogo_Fmt, NewLisp_Fmt, Nginx_Fmt, Nix_Fmt, Nu_Fmt, OCaml_Fmt, ObjC_Fmt, ObjCpp_Fmt, ObjJ_Fmt, OpenCL_Fmt, OpenEdge_ABL_Fmt,

IDOL KeyView (12.12)

Page 176 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
aiffsr asfsr assr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

OpenSCAD_Fmt, Ox_Fmt, Oxygene_Fmt, Oz_Fmt, PAWN_Fmt, PHP_Fmt, PLSQL_ Fmt, PLpgSQL_Fmt, Pan_Fmt, Parrot_ Assembly_Fmt, Pascal_Fmt, Perl_Fmt, PicoLisp_Fmt, Pike_Fmt, Pony_Fmt, Powershell_Fmt, Processing_Fmt, Prolog_ Fmt, Puppet_Fmt, PureBasic_Fmt, Python_ Fmt, QMake_Fmt, RAML_Fmt, RDoc_Fmt, REXX_Fmt, R_Fmt, Racket_Fmt, Ragel_ Fmt, Rascal_Fmt, Rebol_Fmt, Red_Fmt, RenPy_Fmt, RenderScript_Fmt, Ring_Fmt, RobotFramework_Fmt, Ruby_Fmt, Rust_ Fmt, SAS_Fmt, SGML_Fmt, SPARQL_Fmt, SQLPL_Fmt, SQL_Fmt, SaltStack_Fmt, Scala_Fmt, Scheme_Fmt, Scilab_Fmt, Scribe_Fmt, Shell_Fmt, Smalltalk_Fmt, Squirrel_Fmt, Stan_Fmt, Stata_Fmt, Stylus_Fmt, SuperCollider_Fmt, Swift_Fmt, SystemVerilog_Fmt, TSV_Fmt, TSV_Fmt, TXL_Fmt, Tcl_Fmt, Tex_Fmt, Turing_Fmt, Turtle_Fmt, TypeScript_Fmt, UrWeb_Fmt, Verilog_Fmt, Vim_script_Fmt, Visual_ Basic_Fmt, WebAssembly_Fmt, WebIDL_ Fmt, Wiki_Fmt, X10_Fmt, XQuery_Fmt, Xojo_Fmt, Xtend_Fmt, YAML_Fmt, YANG_ Fmt, Zephir_Fmt, eC_Fmt, reStructuredText_Fmt, xBase_Fmt

Audio Interchange M

N

N

N

Y

File Format

N

N

AIFF_Fmt

Advanced Systems N

N

N

N

Y

Format (1.2)

N

N

ASF_Fmt, WMA_Fmt, WMV_Fmt

Applix Spreadsheets Y

Y

Y

N

N

Y

N

Applix_Spreadsheets_Fmt

IDOL KeyView (12.12)

Page 177 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
avrosr1 awsr axsr b1sr bkfsr bmpsr bzip2sr cabsr cdsr
cebsr2

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

(4.2, 4.3, 4.4)

Apache Avro binary Y

N

N

N

N

format

N

N

Avro_Fmt

Applix Words (3.11, Y

Y

Y

N

N

4, 4.1, 4.2, 4.3, 4.4)

Y

Y

Applix_Words_Fmt

Applix Asterix

Y

T

T

N

N

N

N

Applix_Alis_Fmt

B1

N

N

Y

Y

N

n/a

N

B1_Fmt

Microsoft Backup N

N

Y

Y

N

File

n/a

N

BKF_Fmt

Windows Bitmap

M

M

N

N

Y

Image

N

N

BMP_Fmt

Bzip2 Compressed N

N

Y

Y

N

File

n/a

N

BZIP2_Fmt

Microsoft Cabinet N

N

Y

Y

N

File (1.3)

n/a

N

CAB_Fmt

Convergent

Y

T

T

N

N

Technologies DEF

Comm. Format

N

N

CT_DEF_Fmt

Founder Chinese E- Y

N

N

N

N

paper Basic (3.2.1)

N

N

Founder_CEB_Fmt

1The avrosr reader is only available on certain platforms (see avrosr in the platform differences section). 2The cebsr reader is only available on certain platforms (see cebsr in the platform differences section). Because of known security vulnerabilities in the third party library used for this format, cebsr is disabled in formats.ini and needs to be explicitly enabled if you wish to use it.

IDOL KeyView (12.12)

Page 178 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader chmsr csvsr dbfsr dbxsr
dcasr
dcmsr
difsr dmgsr dw4sr dxlsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Microsoft Compiled N

N

Y

Y

N

HTML Help (3)

n/a

N

CHM_Fmt

CSV (Comma

Y

Y

Y

N

N

Separated Values)

N

N

CSV_Fmt

dBase Database

Y

Y

Y

N

N

(III+, IV)

N

N

dBase_Fmt

Microsoft Outlook N

N

Y

Y

Y

Express DBX

Message Database

(5.0, 6.0)

Y

N

MS_OEDBX_Fmt

IBM DCA/RFT

Y

Y

Y

N

N

(Revisable Form

Text) (SC23-0758-1)

Y

N

DCA_RFT_Fmt

Digital Imaging &

M

N

N

N

Y

Communications in

Medicine (DICOM)

N

N

Dicom_Fmt

Data Interchange Y

Y

Y

N

N

Format

N

N

DIF_SpreadSheet_Fmt

Mac Disk Copy Disk N

N

Y

Y

N

Image

n/a

N

DMG_Fmt

DisplayWrite (4)

Y

Y

Y

N

N

Y

N

IBM_Display_Write_Fmt

IBM Domino Data in N

N

Y

Y

Y

XML format1

N

N

Lotus_Domino_DXL_Fmt

1Supports non-encrypted embedded files only. IDOL KeyView (12.12)

Page 179 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader emlsr1
emxsr
encase2sr
encasesr
entsr epubsr
exesr foliosr gdsiisr gifsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Text Mail (MIME) / Y

T

T

Y

Y

Microsoft Outlook

Express (Windows

6, MacIntosh 5)

Y

N

SMTP_Fmt

Legato

N

N

Y

Y

N

EMailXtender

Archives

n/a

N

EMX_Fmt

Expert Witness

N

N

Y

Y

N

Compression

Format (EnCase) (7)

n/a

N

EnCase_Fmt

Expert Witness

N

N

Y

Y

N

Compression

Format (EnCase) (6)

n/a

N

EnCase_Fmt

Microsoft Entourage N

N

Y

Y

Y

Database (2004)

Y

N

ENT_Fmt

Open Publication Y

Y

Y

N

Y

Structure eBook

(2.0, 3.0)

Y

N

Epub_Fmt, iBooks_Fmt

MSDOS/Windows N

N

Y

N

N

Executable

n/a

N

MS_Executable_Fmt

Folio Flat File (3.1) Y

Y

Y

N

Y

Y

Y

Folio_Flat_Fmt

GDSII data format Y

T

T

N

N

N

N

GDSII_Fmt

GIF (87, 89)

M

M

N

N

Y

N

N

GIF_87a_Fmt, GIF_89a_Fmt

1This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. IDOL KeyView (12.12)

Page 180 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader gitpacksr gwfssr hl7sr htmlsr1 htmsr hwposr
hwpsr hwpxsr ichatsr icssr isosr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Git Packfile

N

N

Y

Y

N

n/a

N

Git_Packfile_Fmt

GroupWise FileSurf N

N

Y

Y

Y

email

N

N

GWFS_Email_Fmt

Health level7

Y

Y

Y

N

Y

message (2.0)

Y

N

Hl7_Fmt

HTML

N

N

N

N

Y

N

N

MS_Excel_HTML_Fmt, MS_Word_HTML_

Fmt

HTML/XHTML (3, 4) Y

Y

Y

N

Y2

Y

N

HTML_Fmt, Netscape_Bookmark_File_Fmt

Haansoft Hangul

Y

Y

Y

Y

Y

HWP (2002, 2005,

2007, 2010)

Y

N

HWP_Fmt

Haansoft Hangul

Y

Y

Y

N

Y

HWP (97)

Y

N

HWP_Fmt

Haansoft Hangul

Y

T

T

N

N

HWPX

Y

N

HWPX_Fmt

Apple iChat Log (1, Y

Y

Y

N

N

AV 2, AV 2.1, AV 3)

N

N

Apple_iChat_Fmt

Microsoft Outlook N

N

Y

Y

Y

iCalendar (1.0, 2.0)

Y

N

ICS_Fmt

ISO-9660 CD Disc N

N

Y

Y

N

Image

n/a

N

ISO_Fmt

1The htmlsr reader is only available on certain platforms (see htmlsr in the platform differences section). 2HTML only supports partial metadata extraction
IDOL KeyView (12.12)

Page 181 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader iwss13sr1 iwsssr iwwp13sr2 iwwpsr jp2000sr
jpgsr jtdsr kpagrdr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Apple iWork

Y

T

T

N

N

Numbers ('13, '16,

'18, iCloud 2018)

Y

N

IWSS13_Fmt

Apple iWork

Y

Y

Y

N

Y

Numbers ('08, '09)

Y

N

IWSS_Fmt

Apple iWork Pages Y

T

T

N

N

('13, '16, '18, iCloud

2018)

N

N

IWWP13_Fmt

Apple iWork Pages Y

Y

Y

N

Y

('08, '09)

Y

N

IWWP_Fmt

JPEG (2000)

M

M

N

N

Y

N

N

ISO_JPEG2000_JP2_Fmt, ISO_

JPEG2000_JPM_Fmt, ISO_JPEG2000_

JPX_Fmt, JPEG_2000_JP2_File_Fmt,

JPEG_2000_PGX_Fmt, Motion_JPEG_

2000_Fmt

JPEG Interchange M

M

N

N

Y

Format (JFIF)

N

N

JPEG_File_Interchange_Fmt

JustSystems

Y

Y

Y

N

P

Ichitaro (8 to 2013,

2018)

N

Y

ICHITARO_Compr_Fmt, ICHITARO_Fmt

Applix

Y

Y

Y

N

N

Presents/Graphics

(4.0, 4.2, 4.3, 4.4)

N

N

Applix_Graphics_Fmt

1The iwss13sr reader is only available on certain platforms (see iwss13sr in the platform differences section). 2The iwwp13sr reader is only available on certain platforms (see iwwp13sr in the platform differences section).
IDOL KeyView (12.12)

Page 182 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader kpanirdr kpbmprdr kpCATrdr kpcdrrdr
kpcgmrdr3 kpchtrdr
kpdcxrdr kpDWGrdr4
kpDXFrdr5

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Windows Animated N

Y

Y

N

N

Cursor

Windows Bitmap

Y1

Y

Y

N

N

Image

N

N

Windows_Animated_Cursor_Fmt

N

N

BMP_Fmt

CATIA formats (5) Y

N

N

N

Y

CorelDRAW2

N

Y

Y

N

N

(through 9.0, 10, 11,

12, X3)

N

N

CATIA_Fmt

N

N

Corel_Draw_Fmt

Computer Graphics Y

Y

Y

N

N

Metafile

N

N

CGM_Binary_Fmt, CGM_Character_Fmt,

CGM_ClearText_Fmt

Microsoft Excel (2-7) N

Y

Y

N

N

and Lotus 1-2-3

Charts (2-5)

N

N

DCX Fax System N

Y

Y

N

N

N

N

DCX_Fmt

Autodesk AutoCAD Y

Y

Y

N

Y

DWG Drawing (R13

onwards)

Y

N

AutoDesk_DWG_Fmt

Autodesk AutoCAD Y

Y

Y

N

Y

Y

N

AutoCAD_DXF_Binary_Fmt, AutoCAD_

1Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately. 2CDR/CDR with TIFF header. 3Files with non-partitioned data are supported. 4The kpDWGrdr reader exists to provide DWG support on platforms where kpODArdr is not available (see kpDWGrdr in the platform differences section), but does not
support graphics for versions after 2004 or text for versions after 2013. 5The kpDXFrdr reader exists to provide DXF support on platforms where kpODArdr is not available (see kpDXFrdr in the platform differences section), but does not
support graphics for versions after 2004.

IDOL KeyView (12.12)

Page 183 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

DXF Drawing (R13 onwards)

DXF_Text_Fmt

kpemfrdr

Enhanced Metafile Y

Y

Y

N

Y

N

N

Enhanced_Metafile_Fmt

kpepsrdr

Encapsulated

N

Y

Y

N

N

PostScript (raster)

(TIFF header)

N

N

EPSF_Fmt, Preview_EPSF_Fmt

kpGFLrdr

Omni Graffle

Y

N

N

N

Y

Y

N

Omni_Graffle_XML_Fmt

kpgifrdr

GIF (87, 89)

Y1

Y

Y

N

N

N

N

GIF_87a_Fmt, GIF_89a_Fmt

kpicordr

Windows Icon

N

Y

Y

N

N

Cursor

N

N

Windows_Icon_Fmt

kpIWPG13rdr2 Apple iWork

Y

T

N

N

N

Keynote ('13, '16,

'18, iCloud 2018)

N

N

IWPG13_Fmt

kpIWPGrdr

Apple iWork

Y

Y

Y

N

Y

Keynote (2, 3, '08,

'09)

Y

N

IWPG13_Fmt, IWPG_Fmt

kpJBIG2rdr

JBIG2

Y3

Y

Y

N

N

kpjp2000rdr

JPEG (2000)

Y4

Y

Y

N

N

N

N

JBIG2_Fmt

N

N

ISO_JPEG2000_JP2_Fmt, ISO_

1Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately. 2The kpIWPG13rdr reader is only available on certain platforms (see kpIWPG13rdr in the platform differences section). 3Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately. 4Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately.

IDOL KeyView (12.12)

Page 184 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
kpjpgrdr kpmacrdr kpmsordr kpODArdr2 kpodfrdr
kpp40rdr kpp95rdr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

JPEG2000_JPM_Fmt, ISO_JPEG2000_ JPX_Fmt, JPEG_2000_JP2_File_Fmt, JPEG_2000_PGX_Fmt, Motion_JPEG_ 2000_Fmt

JPEG Interchange Y1

Y

Y

N

N

Format (JFIF)

N

N

JPEG_File_Interchange_Fmt

MacPaint

N

Y

Y

N

N

N

N

MacPaint_Fmt

Microsoft Office

N

Y

Y

N

N

Drawing

N

N

MS_Office_Drawing_Fmt

ODA

Y

Y

OASIS Open

Y

Y

Document Format

(1, 23)

Y

N

Y

Y

Y4

Y

Y

N

AutoCAD_DXF_Binary_Fmt, AutoCAD_

DXF_Text_Fmt, AutoDesk_DWG_Fmt

Y

N

ODF_Drawing_Fmt, ODF_Drawing_

Template_Fmt, ODF_Presentation_Fmt,

ODF_Presentation_Template_Fmt, SO_

Drawing_XML_Fmt, SO_Presentation_

XML_Fmt

Microsoft

Y

Y

Y

N

P5

N

N

PowerPoint_Win_Fmt

PowerPoint (98)

Microsoft PowerPoint

Y

Y

Y

N

P

Y

N

PowerPoint_95_Fmt

1Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately. 2The kpODArdr reader is only available on certain platforms (see kpODArdr in the platform differences section). 3Generated by OpenOffice Impress 2.0, StarOffice 8 Impress, and IBM Lotus Symphony Presentation 3.0. 4Supported using the olesr embedded objects reader. 5Microsoft PowerPoint Windows only

IDOL KeyView (12.12)

Page 185 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader kpp97rdr
kppctrdr kppcxrdr kppdf2rdr2 kppdfrdr kppicrdr kppngrdr kpppxrdr
kpprerdr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Windows (95)

Microsoft

Y

Y

Y

N

P

PowerPoint (97-

2004)

Y

Y1 PowerPoint_2000_Fmt, PowerPoint_97_

Fmt, WPS_Office_PG_Fmt

Macintosh Raster / N

Y

Y

N

N

QuickDraw (2)

N

N

Mac_PICT_Fmt

PC PaintBrush (3) N

Y

Y

N

N

N

N

PC_Paintbrush_Fmt

Adobe PDF (1.1 to N

N

Y

N

N

1.7, 2.0)

N

N

PDF_Fmt

Adobe PDF (1.1 to N

Y

Y

N

N

1.7, 2.0)

N

N

PDF_Fmt

Lotus PIC

Y

Y

Y

N

N

Portable Network

Y3

Y

Y

N

N

Graphics

N

N

Lotus_PIC_Fmt

N

N

APNG_Fmt, PNG_Fmt

Microsoft

Y

Y

Y

Y

Y

PowerPoint

Windows XML (2007

onwards)

Y

Y

MS_PPT_2007_Fmt, MS_PPT_Macro_

2007_Fmt

Lotus Freelance

Y

Y

Y

N

N

Graphics 2 (2)

N

N

Freelance_OS2_Fmt, Freelance_Win_Fmt

1Microsoft PowerPoint Windows only 2kppdf2rdr is an alternate graphic-based reader that produces high-fidelity output but does not support other features such as highlighting or text searching. The
kppdf2rdr reader is only available on certain platforms (see kppdf2rdr in the platform differences section). 3Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately.

IDOL KeyView (12.12)

Page 186 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader kpprzrdr
kpsddrdr kpsdwrdr kpsgirdr kpshwrdr
kpsunrdr kpTGArdr kptifrdr kpUGrdr kpVSD2rdr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Lotus Freelance

Y

Y

Y

N

N

Graphics (96, 97,

98, R9, 9.8)

N

N

Freelance_96_Fmt, Freelance_97_Fmt,

Freelance_DOS_Fmt

StarOffice Impress Y

T

N

N

N

(3, 4, 5)

N

N

SO_Presentation_Fmt

Lotus AMIDraw

N

Y

Y

N

N

Graphics

N

N

Ami_Pro_Draw_Fmt, SO_Text_Fmt

SGI RGB Image

N

Y

Y

N

N

N

N

SGI_Image_Fmt

Corel Presentations Y

Y

Y

N

N

(6, 7, 8, 9, 10, 11,

12, X3)

N

N

Corel_Presentations_Fmt

Sun Raster Image N

Y

Y

N

N

N

N

Sun_Raster_Fmt

Truevision Targa (2) N

Y

Y

N

N

N

N

Targa_Fmt

TIFF Tagged Image Y2

Y

Y

N

N

File (through 6.01)

N

N

TIFF_Fmt

Unigraphics (UG) Y

N

N

N

N

NX

N

N

Unigraphics_NX_Fmt

Microsoft Visio (4, 5, Y

Y

Y

N

Y

Y

N

MS_Visio_Fmt

1The following compression types are supported: no compression, CCITT Group 3 1-Dimensional Modified Huffman, CCITT Group 3 T4 1-Dimensional, CCITT Group
4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits. 2Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is
licensed separately.

IDOL KeyView (12.12)

Page 187 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
kpVSDXrdr
kpwg2rdr kpwmfrdr kpwpgrdr kpXFDLrdr
kvgz kvgzsr kvhqxsr kvzee

Description 2000, 2002, 2003, 2007, 20101) Microsoft Visio (2013)
WordPerfect Graphics 2 (2, 7) Windows Metafile (3) WordPerfect Graphics 1 (1) Extensible Forms Description Language GZIP archive (2) GZIP archive (2) BinHex UNIX Compress

Filter Export View Extract Metadata Charset H/F Associated File Formats

Y

Y

Y

Y

Y

N

Y

Y

N

N

Y2

Y

Y

N

N

N

Y

Y

N

N

Y

Y

Y

N

Y

N

N

Y

N

N

N

N

N

Y

N

N

N

Y

Y

N

N

N

Y

N

N

Y

N

MS_Visio_2013_Fmt, MS_Visio_2013_

Macro_Fmt, MS_Visio_2013_Stencil_Fmt,

MS_Visio_2013_Stencil_Macro_Fmt, MS_

Visio_2013_Template_Fmt, MS_Visio_

2013_Template_Macro_Fmt

N

N

WordPerfect_Graphics_Fmt

N

N

Windows_Metafile_Fmt, Windows_

Metafile_NoHdr_Fmt

N

N

WordPerfect_Graphics_Fmt

Y

N

XFDL_Fmt

n/a

N

GZ_Compress_Fmt

n/a

N

GZ_Compress_Fmt

n/a

N

BinHex_Fmt

n/a

N

Compress_Fmt

1Viewing and Export use the graphic reader, kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions. Image fidelity in Viewing and
Export is therefore only supported for versions 2003 and above. Filter uses the graphic reader kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all
earlier versions. 2Windows Metafiles can contain both raster images (KeyView file class 4) and vector graphics (KeyView file class 5). Filtering is supported only for vector graphics
(class 5).

IDOL KeyView (12.12)

Page 188 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader kvzeesr l123sr lasr lwpsr3
lyxsr lzhsr macbinsr mbsr
mbxsr6

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

UNIX Compress

N

N

N

Y

N

n/a

N

Compress_Fmt

Lotus 1-2-3 (96, 97, Y

Y

Y

N

P

R9, 9.8)

Lotus AMI Pro and Y

Y

Y

N

P1

Write Plus (2, 3)

Lotus Word Pro and Y

Y

Y

N

P4

SmartMaster (96,

97, R9)

Y

N

Lotus_123_97_Fmt, Lotus_123_Format_

Fmt, Lotus_123_R9_Fmt

Y2

Y

Ami_Pro_Fmt, Ami_Pro_StyleSheet_Fmt

N

Y5 Lotus_Word_Pro_96_Fmt, Lotus_Word_

Pro_97_Fmt

LyX Word Processor Y

T

T

N

N

N

N

LyX_Fmt

Microsoft LZH

N

N

N

Y

N

Compressed Folder

n/a

N

LZH_Fmt

MacBinary

N

N

Y

Y

N

n/a

N

MacBinary_Fmt

Microsoft Word

Y

Y

Y

N

Y

Macintosh (4, 5, 6,

98)

Text Mail (MIME), Y8

N

T

Y

Y

N

Y

MS_Word_Mac_4_Fmt, MS_Word_Mac_

Fmt

Y

N

MIME_Fmt

1Lotus AMI Pro only 2Lotus AMI Pro only 3The lwpsr reader is only available on certain platforms (see lwpsr in the platform differences section). 4Lotus Word Pro only 5Lotus Word Pro only 6This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. 8Text Mail only
IDOL KeyView (12.12)

Page 189 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
MCI
mdbsr mhtsr mifsr misr mp3sr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Microsoft Outlook Express (Windows 6, MacIntosh 5), Mailbox1 (Thunderbird 1.0, Eudora 6.2)

Microsoft Media

N

N

Y

N

N

Control Interface

Microsoft Access

Y

T

T

N

N

(95 onwards)

N

N

AIFF_Fmt, AU_Audio_Fmt, ISO_

QuickTime_Fmt, MIDI_Audio_Fmt, MPEG_

Audio_Fmt, MS_Video_Fmt, MS_WAVE_

Audio_Fmt, Mobile_QuickTime_Fmt,

QuickTime_Fmt

Y2

N

MS_Access_2000_Fmt, MS_Access_

2007_Fmt, MS_Access_95_Fmt, MS_

Access_97_Fmt, MS_Access_Fmt

MIME HTML

Y

Y

Y

N

Y

(MHTML)

Y

N

MHT_Fmt

Adobe FrameMaker Y

Y

Y

N

N

Interchange Format

(5, 5.5, 6, 7)

Y

N

Maker_Interchange_Fmt

Microsoft Word

Y

Y

Y

N

N

Windows (1.0, 2.0)

N

Y

MS_Word_Win_Fmt

MPEG-1 Audio

M

M

Y

N

Y

layer3 (ID3 v1 and

v2)

N

N

MPEG_Audio_Fmt

1KeyView supports MBX files created by Eudora Email and Mozilla Thunderbird. MBX files created by other common mail applications are typically filtered, converted,
and displayed. 2Charset is not supported for Microsoft Access 95 or 97.

IDOL KeyView (12.12)

Page 190 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader mpeg4sr
mppsr msgsr1 mspubsr msw6sr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

MPEG video

M

N

N

N

Y

N

N

Adobe_Flash_Audio_Book_Fmt, Adobe_

Flash_Audio_Fmt, Adobe_Flash_

Protected_Video_Fmt, Adobe_Flash_

Video_Fmt, Audible_Audiobook_Fmt, ISO_

3GPP2_Fmt, ISO_3GPP_Fmt, ISO_IEC_

MPEG_4_Fmt, KDDI_Video_Fmt, MPEG4_

AVC_Fmt, MPEG4_M4A_Fmt, MPEG4_

M4B_Fmt, MPEG4_M4P_Fmt, MPEG4_

M4V_Fmt, MPEG4_Sony_PSP_Fmt,

MPEG_21_Fmt, NTT_MPEG4_Fmt, Nero_

MPEG4_Audio_Fmt, QuickTime_Fmt,

Sony_XAVC_Fmt

Microsoft Project

Y

Y

Y

Y

Y

(2000 onwards)

Microsoft Outlook Y2

T3

Y4

Y

Y

(97 onwards),

Documentum

EMCMF

Y

N

MS_Project_2000_Fmt, MS_Project_2007_

Fmt, MS_Project_41_Fmt, MS_Project_4_

Fmt, MS_Project_98_Fmt

Y5

N

EMCMF_Fmt, MS_Outlook_Fmt

Microsoft Publisher Y

T

T

Y

Y

(98 to 2016)

Y

N

MS_Publisher_98_Fmt, MS_Publisher_Fmt

Microsoft Works

Y

Y

Y

N

N

Word Processor for

Windows (6, 2000)

N

Y

MS_Works_Win_WP_Fmt

1This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. 2Except Documentum EMCMF 3Except Documentum EMCMF 4For Outlook this is Text only 5Returns "Unicode" character set for Outlook version 2003 and up, and "Unknown" character set for previous versions.
IDOL KeyView (12.12)

Page 191 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader mswsr multiarcsr1 mw6sr mw8sr mwsr
mwssr mwxsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Microsoft Works

Y

Y

Y

N

N

Word Processor for

Windows (1, 2, 3, 4)

Compressed

N

N

Y2

Y

N

formats

N

Y

MS_Works_Win_WP_Fmt

n/a

N

ARJ_Fmt, RAR5_Fmt, XZ_Fmt

Microsoft Word for Y

Y

Windows (6, 7, 8,

95)

Microsoft Word (97- Y

Y

2004)

Y

N

Y

Y

Y3

Y

Y

Y

MS_Word_95_Fmt

Y

Y4 MS_Word_2000_Fmt, MS_Word_97_Fmt,

WPS_Office_WP_Fmt

Microsoft Word PC Y

Y

Y

N

N

(4-6) and Windows

Write (1-3)

Y5

Y6 MS_Windows_Write_Fmt, MS_Word_PC_

Driver_Fmt, MS_Word_PC_Fmt, MS_

Word_PC_Glossary_Fmt, MS_Word_PC_

Misc_Fmt, MS_Word_PC_StyleSheet_Fmt

Microsoft Works

Y

Y

Y

N

N

Spreadsheet (2, 3,

4)

Y

N

MS_Works_DOS_SS_Fmt, MS_Works_

Mac_SS_Fmt, MS_Works_Win_SS_Fmt

Microsoft Word XML Y

Y

Y

Y

Y

(2007 onwards)

Y

Y

MS_Word_2007_Flat_XML_Fmt, MS_

Word_2007_Fmt, MS_Word_Macro_2007_

Fmt

1The multiarcsr reader is only available on certain platforms (see multiarcsr in the platform differences section). 7zip is supported with the multiarcsr reader on some
platforms for Extract. 27-zip and SUN PEX archives only 3Supported using the embedded objects reader olesr. 4Microsoft Word for Windows only 5Microsoft Windows Write only 6Microsoft Word PC only

IDOL KeyView (12.12)

Page 192 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader nnsr nsfsr1
oa2sr odfsssr
odfwpsr
olesr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

NBI OASys Net

Y

T

T

N

N

Archive

N

N

NBI_Net_Archive_Fmt

IBM Lotus Notes

N

N

Y

Y

Y

database (4, 5, 6.0,

6.5, 7.0, 8.0)

N

N

Lotus_Notes_NSF_Fmt

Fujitsu Oasys (7)

Y

Y

OASIS Open

Y

Y

Document Format (1, 22)

OASIS Open

Y

Y

Document Format (1, 24)

Y

N

P

Y

Y3

Y

Y

Y5

Y

N

N

Oasys_Fmt

Y

N

ODF_Spreadsheet_Fmt, ODF_

Spreadsheet_Template_Fmt

Y

Y

ODF_Text_Fmt, ODF_Text_Master_Fmt,

ODF_Text_Template_Fmt, ODF_Text_

Web_Fmt, SO_Text_XML_Fmt

Windows Scrap File N

N

N

Y

Y

n/a

N

Ability_WP_OLE_Fmt, Autodesk_3ds_

Max_Fmt, Crystal_Reports_Fmt, FPX_Fmt,

MS_AtWork_Fax_Fmt, MS_Binder_Fmt,

MicroStation_V8_DGN_Fmt, OLE_Fmt,

PageMagic_Fmt, PagePlus_Fmt,

PhotoDraw_Mix_Fmt, PowerPoint_Mac_

Fmt, SO_Chart_Fmt, SO_Database_Fmt,

SO_Math_Fmt, Scrap_Fmt, SolidWorks_

Fmt, Solid_Edge_Assembly_Fmt, Solid_

Edge_Part_Fmt, Solid_Edge_SheetMetal_

Fmt, Windows_Installer_Fmt, Windows_

1The nsfsr reader is only available on certain platforms (see nsfsr in the platform differences section). 2Generated by OpenOffice Calc 2.0, StarOffice 8 Calc, and IBM Lotus Symphony Spreadsheet 3.0. 3Supported using the embedded objects reader olesr. 4Generated by OpenOffice Writer 2.0, StarOffice 8 Writer, and IBM Lotus Symphony Documents 3.0. 5Supported using the embedded objects reader olesr.
IDOL KeyView (12.12)

Page 193 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader olmsr onesr
onealtsr
onmsr oo3sr orcsr1 parquetsr2 pbixsr pdf2sr3

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Installer_Patch_Fmt

Microsoft Outlook for N

N

Y

Y

N

Macintosh (2011)

Y

N

MS_OutlookOLM_Fmt

Microsoft OneNote Y

Y

Y

Y

N

(2007, 2010, 2013,

2016)

Y

N

OneNote_Fmt

Microsoft OneNote Y

T

T

Y

N

Alternative

Packaging Format

(2007 onwards)

N

N

OneNote_Alternate_Fmt

Legato Extender

N

N

Y

Y

Y

N

N

Legato_Extender_ONM_Fmt

Omni Outliner (v3, Y

Y

Y

N

N

OPML, OOutline)

Y

N

OO3_Fmt, OOUTLINE_Fmt, OPML_Fmt

Apache ORC

Y

N

N

N

N

(Optimized Row

Columnar) data

N

N

Apache_ORC_Fmt

Apache Parquet

Y

N

N

N

Y

Database Format

N

N

Apache_Parquet_Fmt

Microsoft Power BI Y

T

T

N

N

Desktop (1.11)

Y

N

MS_Power_BI_Fmt

Adobe PDF (1.1 to Y

Y

N

Y

Y

1.7, 2.0)

N

N

PDF_Fmt

1The orcsr reader is only available on certain platforms (see orcsr in the platform differences section). 2The parquetsr reader is only available on certain platforms (see parquetsr in the platform differences section). 3The pdf2sr reader is only available on certain platforms (see pdf2sr in the platform differences section).
IDOL KeyView (12.12)

Page 194 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader pdfsr pfasr pffsr2
pfilesr
pkcs7sr6 pngsr psdsr pstnsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Adobe PDF (1.1 to Y

Y

1.7, 2.0)

N

Y1

Y

Y

N

PDF_Fmt, Portfolio_PDF_Fmt

ASCII Printer and Y

T

T

N

N

PostScript fonts

N

N

PostScript_Font_Fmt, Printer_Font_ASCII_

Fmt

Microsoft Outlook N

N

Y

Y

Y

Offline Storage File

(97 onwards)

Y

N

MS_OutlookOST_Fmt

Rights Management Y3

T4

T5

N

Y

Services (RMS)-

protected format

N

N

RMS_Protected_Fmt

PKCS #7

N

N

Y

Y

N

cryptographic format

N

N

PKCS_7_Fmt

Portable Network M

M

N

N

Y

Graphics

N

N

PNG_Fmt

Adobe Photoshop N

N

N

N

Y7

N

N

PSD_Fmt

Microsoft Outlook N

N

Y

Y

Y

Y

N

MS_OutlookPST_Fmt

1Includes support for extraction of subfiles from PDF Portfolio documents. 2The pffsr reader is only available on certain platforms (see pffsr in the platform differences section). 3KeyView filters only the internal redirection text. The underlying document text is not accessible without the decryption key. 4KeyView filters only the internal redirection text. The underlying document text is not accessible without the decryption key. 5KeyView filters only the internal redirection text. The underlying document text is not accessible without the decryption key. 6This reader supports PKCS #7 signed-data encapsulating PKCS #7 data only. 7Only XMP metadata is extracted for this format.
IDOL KeyView (12.12)

Page 195 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
pstsr2
pstxsr
pwsr pxlsr qpssr qpwsr rarsr

Description
Personal Folder1 (97 onwards)

Filter Export View Extract Metadata Charset H/F Associated File Formats

Microsoft Outlook N

N

Y

Y

Y

Personal Folder3

(97 onwards)

N

N

MS_OutlookPST_Fmt

Microsoft Outlook N

N

Y

Y

Y

Personal Folder4

(97 onwards)

Y

N

MS_OutlookPST_Fmt

PRIMEWORD

Y

T

T

N

N

N

N

PRIMEWORD_Fmt

HP PCL XL (PCL 6) Y

T

T

N

N

N

N

HP_PCL_XL_Fmt

Corel Quattro Pro (5, Y

Y

Y

N

P

6, 7, 8)

Y

N

Quattro_Pro_Win_Fmt

Corel Quattro Pro Y

N

Y

N

P

(X4)

Y

N

QPW_Fmt

RAR archive (2.0

N

N

N

Y

N

n/a

N

RAR_Fmt

1KeyView provides several readers capable of processing PST files. The pstsr reader uses the Microsoft Messaging Application Programming Interface (MAPI), works only on Windows, and requires that Microsoft Outlook is installed. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section) and does not require Microsoft Outlook. The pstnsr reader is an alternative reader that does not require Microsoft Outlook, for all platforms not supported by pstxsr. For more information about these readers, see "Extract Subfiles from Outlook Personal Folders Files" in Chapter 3. 2This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files.
3KeyView provides several readers capable of processing PST files. The pstsr reader uses the Microsoft Messaging Application Programming Interface (MAPI), works only on Windows, and requires that Microsoft Outlook is installed. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section) and does not require Microsoft Outlook. The pstnsr reader is an alternative reader that does not require Microsoft Outlook, for all platforms not supported by pstxsr. For more information about these readers, see "Extract Subfiles from Outlook Personal Folders Files" in Chapter 3. 4KeyView provides several readers capable of processing PST files. The pstsr reader uses the Microsoft Messaging Application Programming Interface (MAPI), works only on Windows, and requires that Microsoft Outlook is installed. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section) and does not require Microsoft Outlook. The pstnsr reader is an alternative reader that does not require Microsoft Outlook, for all platforms not supported by pstxsr. For more information about these readers, see "Extract Subfiles from Outlook Personal Folders Files" in Chapter 3.

IDOL KeyView (12.12)

Page 196 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader riffsr rpmsgsr1
rtfsr sassr skypesr sosr
starcsr starwsr stringssr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

through 3.5)

Microsoft Wave

M

N

Sound

Microsoft Outlook N

N

Restricted

Permission

Message

N

N

Y

N

Y2

N

N

N

MS_WAVE_Audio_Fmt

Y

N

RPMSG_Fmt

Rich Text Format (1 Y

Y

Y

N

P

through 1.7)

Y

Y

MS_Pocket_Word_Fmt, MS_RTF_Fmt

SAS7BDAT reader Y

T

T

N

N

N

N

SAS7BDAT_Fmt

Skype Log (3)

Y

Y

Y

N

N

N

N

Skype_Fmt

OpenOffice,

Y

T

T

N

Y

LibreOffice(1-5),

StarOffice (6-9)

Y

N

SO_Spreadsheet_XML_Fmt

StarOffice Calc (3, 4, Y

T

T

N

N

5)

N

N

SO_Spreadsheet_Fmt

StarOffice Writer (3, Y

T

T

N

N

4, 5)

N

N

SO_Text_Fmt

Generic 'strings'

Y

T

T

N

N

reader

N

N

BeagleWorks_Word_Fmt, CEOwrite_Fmt,

CPT_Comm_Fmt, CWK_Fmt, DG_CDS_

Fmt, DSA101_Fmt, Data_Point_

VistaWord_Fmt, Enable_WP_Fmt,

GreatWorks_Word_Fmt, HP_Word_PC_

1The rpmsgsr reader is only available on certain platforms (see rpmsgsr in the platform differences section). 2Extraction of embedded email messages is not currently supported.
IDOL KeyView (12.12)

Page 197 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
swfsr swsr tarsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Fmt, IBM_DCF_Script_Fmt, IBM_Writing_ Assistant_Fmt, Lotus_Notes_CDF_Fmt, Lyrix_Fmt, MASS_11_Fmt, MS_Works_ DOS_WP_Fmt, MS_Works_Mac_WP_Fmt, MacWrite_Fmt, MacWrite_II_Fmt, Multimate_Adv_Fmt, Multimate_Adv_ Fnote_Fmt, Multimate_Adv_II_Fmt, Multimate_Adv_II_Fnote_Fmt, Multimate_ Fmt, Multimate_Fnote_Fmt, Navy_DIF_ Fmt, ODA_Q1_11_Fmt, ODA_Q1_12_Fmt, Office_Writer_Fmt, Psion_TextEd_Fmt, Psion_Word_3_Fmt, Psion_Word_Fmt, Q_ A_DOS_Fmt, Q_A_Win_Fmt, Quadratron_ Q_One_v1_Fmt, Quadratron_Q_One_v2_ Fmt, Quickword_Fmt, SAMNA_Word_IV_ Fmt, Symbol_Dynamics_EXP5_Fmt, Symbol_Dynamics_EXP_Fmt, Targon_ Word_Fmt, Uniplex_WP_Fmt, Volkswriter_ Fmt, WANG_WITA_Fmt, WANG_WPS_ Comm_Fmt, WPS_PLUS_Fmt, WordERA_ Fmt, WordMARC_Fmt, WordPerfect_Fmt, WordStar_2000_Fmt, WordStar_Fmt, WordStar_for_Windows_Fmt, Word_ Connection_Fmt, WriteNow_Fmt, Xerox_ 860_Comm_Fmt, Xerox_Writer_Fmt

Macromedia Flash Y

Y

Y

N

N

(through 8.0)

Y1

N

Macromedia_Flash_Fmt

Informix SmartWare Y

T

T

N

N

II Word Processor

N

N

SmartWare_II_WP_Fmt

TAR Tape Archive N

N

Y

Y

N

n/a

N

TAR_Fmt

1The character set cannot be determined for versions 5.x and lower. IDOL KeyView (12.12)

Page 198 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader tifsr tnefsr
unihtmsr unisr unzip
uudsr vcfsr
vsdsr
wkssr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

TIFF Tagged Image M

M

N

N

Y

File (through 6.01)

N

N

TIFF_Fmt

Transport Neutral N

N

Y

Y

Y

Encapsulation

Format

Y

N

TNEF_Fmt

Unicode HTML

Y

Y

Y

N

Y

Y

N

Unicode_HTML_Fmt

Unicode Text (3, 4) Y

Y

Y

N

N

Y

N

Unicode_Fmt

PKZIP/Zip

N

N

Y2

Y

N

Compression

n/a

N

Executable_JAR_Fmt, KMZ_Fmt, ODF_

Formula_Fmt, ODF_Formula_Template_

Fmt, PKZIP_Fmt, Tableau_Packaged_

Data_Source_Fmt, Tableau_Packaged_

Workbook_Fmt

UU-Encoding (all

N

N

Y

Y

N

versions)

n/a

N

UUEncoded_Fmt

Microsoft Outlook Y

Y

T

N

Y

vCard Contact (2.1,

3.0, 4.0)

N

N

VCF_Fmt

Microsoft Visio (4, 5, Y

Y

Y

Y

Y

2000, 2002, 2003,

2007, 20103)

Y

N

MS_Visio_Fmt

Lotus 1-2-3 (2, 3, 4, Y

Y

Y

N

N

Y

N

Lotus_123_Worksheet_Fmt

1The following compression types are supported: no compression, CCITT Group 3 1-Dimensional Modified Huffman, CCITT Group 3 T4 1-Dimensional, CCITT Group 4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits. 2PKZIP, WinZip, and Java Archive only 3Viewing and Export use the graphic reader, kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions. Image fidelity in Viewing and Export is therefore only supported for versions 2003 and above. Filter uses the graphic reader kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions.

IDOL KeyView (12.12)

Page 199 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader wosr wp6sr wpmsr xlsbsr xlssr xlsxsr xmlsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

5)

Corel WordPerfect Y

Y

Y

N

P

Windows (5, 5.1)

Y

Y

WordPerfect_5_Fmt

Corel WordPerfect Y

Y

Y

N

P

(6 onwards)

Y

N

WordPerfect_6_Fmt

Corel WordPerfect Y

Y

Y

N

N

Macintosh (1.02, 2,

2.1, 2.2, 3, 3.1)

Y

N

WordPerfect_Mac_Fmt

Microsoft Excel

Y

Y

Binary Format (2007

onwards)

Microsoft Excel (2.2 Y

Y

to 2004)

Y

N

Y

Y

Y1

Y

N

N

MS_Excel_Binary_2007_Fmt

Y

Y2 Excel_2000_Fmt, Excel_95_Fmt, Excel_

97_Fmt, Excel_Chart_Fmt, Excel_Fmt,

Excel_Macro_Fmt, WPS_Office_SS_Fmt

Microsoft Excel

Y

Y

Y

Y

Y

Windows XML (2007

onwards)

Y

Y

MS_Excel_2007_Fmt, MS_Excel_Macro_

2007_Fmt

XML

Y

T

T

N

Y

Y

N

AMF_Fmt, AbiWord_Fmt, Adobe_XML_

Data_Package_Fmt, Atom_Syndication_

Fmt, CDXML_Fmt, Chemical_Markup_

Language_Fmt, Collada_DAE_Fmt,

Consolidated_CDA_Fmt, ESzigno_Fmt,

FictionBook_Fmt, Grasshopper_GHX_Fmt,

JNLP_Fmt, JavaView_JVX_Fmt, KML_Fmt,

MAML_Fmt, MARC_XML_Fmt, METS_

1Supported using the embedded objects reader olesr. 2Microsoft Excel for Windows only
IDOL KeyView (12.12)

Page 200 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader
xpssr xywsr yimsr1

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

Fmt, MODS_Fmt, MS_Excel_XML_Fmt, MS_Management_Pack_MPX_Fmt, MS_ Visio_XML_Fmt, MS_Word_XML_Fmt, MXML_Fmt, Mathcad_XML_Fmt, Metalink_ Fmt, Mozilla_XUL_Fmt, MusicXML_Fmt, Open_Diagnostic_Data_Exchange_Fmt, Open_eBook_Fmt, PDF_XML_Forms_ Data_Fmt, PGML_Fmt, PLS_Fmt, RDF_ XML_Fmt, RSS_Fmt, Really_Simple_ Discovery_Fmt, SBML_Fmt, SMIL_Fmt, SPARQL_Results_Fmt, SRGS_Fmt, SRU_ Fmt, SSML_Fmt, SVG_Fmt, SyncML_Fmt, TEI_Fmt, Tableau_Data_Source_Fmt, Tableau_Map_Source_Fmt, Tableau_ Preferences_Fmt, Tableau_Workbook_ Fmt, Uniform_Office_Fmt, Uniform_Office_ Text_Fmt, VTK_XML_Fmt, VoiceXML_Fmt, WML_Fmt, Windows_Audio_Playlist_Fmt, XAML_Browser_Application_Fmt, XBRL_ Fmt, XDF_Fmt, XLIFF_Fmt, XML_Fmt, XML_Shareable_Playlist_Fmt, XSLT_Fmt, XSL_FO_Fmt, YIN_Fmt

Microsoft XML

Y

T

T

N

N

Paper Specification

N

N

MS_XPS_Fmt

XyWrite / Nota Bene Y

Y

Y

N

N

(4.12)

N

N

XyWrite_Fmt

Yahoo! Instant

Y

Y

Y

N

N

Messenger

N

N

YIM_Fmt

1To successfully use this reader, you must set the KV_YAHOO_ID environment variable to the Yahoo user ID. You can optionally set the KV_OTHER_YAHOO_ID environment variable to the other Yahoo user ID. If you do not set it, "Other" is used by default. If you enter incorrect values for the environment variables, erroneous data is generated.

IDOL KeyView (12.12)

Page 201 of 280

Filter SDK Java Programming Guide Appendix B: Document Readers

Reader z7zsr zstdsr

Description

Filter Export View Extract Metadata Charset H/F Associated File Formats

7-Zip archive (4.57) N

N

Y

Y

N

n/a

N

Z7Z_Fmt

Zstandard

N

N

N

Y

N

compression

n/a

N

Zstandard_Fmt

IDOL KeyView (12.12)

Page 202 of 280

Appendix C: Platform Differences

Most KeyView features and document readers are available across all platforms. This section describes the supported platforms for certain features that are not available on every platform.

· Feature Differences

204

· Reader Differences

205

IDOL KeyView (12.12)

Page 203 of 280

Feature Differences

Feature
Filter C++ API Filter .NET API RMS Decryption XMP extraction1 XMP extraction - HTML (HTML_ Fmt) XMP extraction - additional formats2 Advanced character set detection Source code identification Optical Character Recognition KVOOP privilege reduction Out-of-process logging

Windows

x64 x86











-



















-



-









Linux

x64 x86





-

-



-









AArch64   

macOS

M1 x64



-

-

-

-





Solaris

x64 x86

-

-

-

-



-

-

-

-

-

SPARC64  -

SPARC -

AIX ppc64  -

ppc32  -





-



-

-

-

-

-

-









-





-

-













-

-



-





-

-

-

-

-

-

-

-



-

-

-









-



-

-

-

-

-

-

-

-

-

-



-

-

1This refers to formats PDF (PDF_Fmt), PNG (PNG_Fmt), PSD (PSD_Fmt), JPG (JPEG_File_Interchange_Fmt), TIFF (TIFF_Fmt), XML (XML_Fmt) and pFile (RMS_
Protected_Fmt) 2This refers to formats GIF (GIF_87a_Fmt / GIF_89a_Fmt), jpeg2000 (JPEG_2000_JP2_File_Fmt), SVG (SVG_Fmt), MOV (QuickTime_Fmt), AIFF (AIFF_Fmt), FLV
(Flash_Video_Fmt), SWF (Macromedia_Flash_Fmt), MP3 (MPEG_Audio_Fmt), MPEG4 (ISO_IEC_MPEG_4_Fmt), WAV (MS_WAVE_Audio_Fmt), AVI (MS_Video_
Fmt), EPS (EPSF_Fmt, Preview_EPSF_Fmt), INDD (InDesign_Fmt), WMA (WMA_Fmt) and WMV (WMV_Fmt)

IDOL KeyView (12.12)

Page 204 of 280

Reader Differences

Reader

Windows

x64 x86

avrosr (Apache Avro reader)



-

cebsr (Founder Chinese E-paper -



Basic reader)

htmlsr (HTML reader for XMP extraction)





iwss13sr (Apple iWork 2013 Numbers reader)





iwwp13sr (Apple iWork 2013 Pages 



reader)

kpDWGrdr (Autodesk AutoCAD

-

-

Drawing reader for platforms

without kpODArdr)

kpDXFrdr (Autodesk AutoCAD DXF -

-

reader for platforms without

kpODArdr)

kpIWPG13rdr (Apple iWork 2013 



Keynote reader)

kpODArdr (Autodesk AutoCAD reader)





kppdf2rdr (alternative graphicbased PDF reader)





Linux

x64 x86



-

-

-













-

-

-

-













AArch64    -
-
 -

macOS

M1 x64

-

-

-

-







-

-

-

-





-

-

Solaris

x64 x86

-

-

-

-

-

-



-



-











-

-

-

-

-

SPARC64 

-

SPARC 

-

AIX ppc64 

-

ppc32 

-

IDOL KeyView (12.12)

Page 205 of 280

Filter SDK Java Programming Guide Appendix C: Platform Differences

Reader

Windows

x64 x86

lwpsr (Lotus Word Pro reader)

-



multiarcsr (multiple archive formats 



reader)

nsfsr (Lotus Notes database reader) 



orcsr (Apache ORC reader)



-

parquetsr (Apache Parquet reader) 

-

pdf2sr (alternative PDF reader)





pffsr (Microsoft Outlook Offline Folders File reader)





pstsr (MAPI-based PST reader)





pstnsr (native PST reader for platforms without pstxsr)

-

-

pstxsr (native PST reader)





rpmsgsr (Microsoft Outlook



-

Restricted Permission Message

reader)

Linux

x64 x86

-

-





-





-



-









-

-

-





-



-

AArch64 
 
-
 -

macOS

M1 x64

-

-



-

-



-

-

-

-

-

-

-

-



-

-

-

-

Solaris

x64 x86

-

-





-

-

-

-

-

-

-

-

-

-

-

-





-

-



-

SPARC64 
-



SPARC 
-

-

AIX ppc64 -
-

-

ppc32 -
 -

-

This topic shows only those readers that are unavailable on at least one platform. For a complete list of a readers, see Document Readers, on page 175.

IDOL KeyView (12.12)

Page 206 of 280

Appendix D: Character Sets

This section provides information on the handling of character sets in the KeyView suite of products, which includes KeyView Filter SDK, KeyView Export SDK, and KeyView Viewing SDK.

· Multibyte and Bidirectional Support

207

· Coded Character Sets

215

Multibyte and Bidirectional Support

The KeyView SDKs can process files that contain multibyte characters. A multibyte character encoding represents a single character with consecutive bytes. KeyView can also process text from files that contain bidirectional text. Bidirectional text contains both Latin-based text which is read from left to right, and text that is read from right to left (Hebrew and Arabic).
The following table indicates which character encodings are supported by KeyView for each format.

Multibyte and bidirectional support

Format

Single-byte

Archive

7-Zip (7Z)

n/a

AD1 Evidence file

n/a

ADJ

n/a

B1

n/a

BinHex (HQX)

n/a

Bzip2 (BZ2)

n/a

EnCase ­ Expert Witness

n/a

Compression Format (E01)

GZIP (GZ)

n/a

ISO (ISO)

n/a

Java Archive (JAR)

n/a

Legato EMailXtender Archive

n/a

(EMX)

MacBinary (BIN)

n/a

Mac Disk Copy Disk Image

n/a

Multibyte
n/a n/a n/a n/a n/a n/a n/a
n/a n/a n/a n/a
n/a n/a

Bidirectional
n/a n/a n/a n/a n/a n/a n/a
n/a n/a n/a n/a
n/a n/a

IDOL KeyView (12.12)

Page 207 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte

(DMG)

Microsoft Backup File (BKF)

n/a

Microsoft Cabinet format (CAB) n/a

Microsoft Compiled HTML Help n/a (CHM)

Microsoft Compressed Folder n/a (LZH)

PKZip (ZIP)

n/a

Microsoft Outlook DBX (DBX) Y

Microsoft Outlook Offline Storage Y File (OST)

RAR Archive (RAR)

n/a

Tape Archive (TAR)

n/a

UNIX Compress (Z)

n/a

UUEncoding (UUE)

n/a

Windows Scrap File (SHS)

n/a

WinZip (ZIP)

n/a

Binary

Executable (EXE)

n/a

Link Library (DLL)

n/a

Computer-aided Design

AutoCAD Drawing (DWG)

Y

AutoCAD Drawing Exchange

Y

(DXF)

CATIA formats (CAT)

Y

Microsoft Visio (VSD)

Y

Database

dBase Database

Y

Microsoft Access (MDB)

Y

Microsoft Project (MPP)

Y

Multibyte
n/a n/a n/a
n/a
n/a Y Y
n/a n/a n/a n/a n/a n/a
n/a n/a
Y Y
N Y
N Y Y

IDOL KeyView (12.12)

Bidirectional
n/a n/a n/a
n/a
n/a Y Y
n/a n/a n/a n/a n/a n/a
n/a n/a
Y Y
N Y
N N N
Page 208 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte

Desktop Publishing

Microsoft Publisher

N

Display

Adobe Portable Document

Y

Format (PDF)

Graphics

Computer Graphics Metafile

Y

(CGM)

Corel DRAW (CDR)

n/a

DCX Fax System (DCX)

Y

DICOM ­ Digital Imaging and

n/a

Communications in Medicine

(DCM)

Encapsulated PostScript (EPS) Y

Enhanced Metafile (EMF)

Y

Graphic Interchange Format

n/a

(GIF)

JBIG2

n/a

JPEG

n/a

JPEG 2000

n/a

Lotus AMIDraw Graphics (SDW) n/a

Lotus Pic (PIC)

n/a

Macintosh Raster (PICT/PCT) n/a

MacPaint (PNTG)

n/a

Microsoft Office Drawing (MSO) n/a

Multibyte
Y
Y1
N
n/a N n/a
N Y n/a
n/a n/a n/a n/a n/a n/a n/a n/a

Bidirectional
N
Y
N
n/a N n/a
N N n/a
n/a n/a n/a n/a n/a n/a n/a n/a

1Multibyte PDFs are supported, provided the PDF document is created by using either Character ID-keyed (CID) fonts, predefined CJK CMap files, or ToUnicode font encodings, and does not contain embedded fonts. See the Adobe website and the Adobe Acrobat documentation for more information. Any multibyte characters that are not supported are displayed using the replacement character. By default, the replacement character is a question mark (?).
To determine the type of font encodings that are used in a PDF, open the PDF in Adobe Acrobat, and select File > Document Info > Fonts. If the Encoding column lists Custom or Embedded encodings, you might encounter problems converting the PDF.

IDOL KeyView (12.12)

Page 209 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte

Omni Graffle (GRAFFLE)

Y

PC PaintBrush (PCX)

n/a

Portable Network Graphics

n/a

(PNG)

SGI RGB Image (RGB)

n/a

Sun Raster Image (RS)

n/a

Tagged Image File (TIFF)

Y

Truevision Targa (TGA)

n/a

Windows Animated Cursor (ANI) n/a

Windows Bitmap (BMP)

n/a

Windows Icon Cursor (ICO)

n/a

Windows Metafile (WMF)

Y

WordPerfect Graphics 1 (WPG) Y

WordPerfect Graphics 2 (WPG) Y

Mail

Documentum EMCMF Format Y

Domino XML Language (DXL) Y

GroupWise FileSurf

Y

Legato Extender (ONM)

Y

Lotus Notes database (NSF)

Y

Mailbox (MBX)

Y

Microsoft Entourage Database Y

Microsoft Outlook (MSG)

Y

Microsoft Outlook Express (EML) Y

Microsoft Outlook iCalendar

Y

Microsoft Outlook for Macintosh Y

Microsoft Outlook Offline Storage Y File

Microsoft Outlook Personal File Y Folders (PST)

Multibyte N n/a n/a
n/a n/a N n/a n/a n/a n/a Y N N
Y Y N Y Y Y Y Y Y Y Y Y
Y

IDOL KeyView (12.12)

Bidirectional N n/a n/a
n/a n/a N n/a n/a n/a n/a N N N
Y N N N Y Y Y Y Y Y Y Y
Y
Page 210 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte Multibyte

Bidirectional

Microsoft Outlook vCard Contact

Text Mail (MIME)

Y

Y

Y

Transport Neutral Encapsulation Y

Y

Y

Format

Multimedia

Advanced Systems Format

n/a

n/a

n/a

(ASF)

Audio Interchange File Format n/a

n/a

n/a

(AIFF)

Microsoft Wave Sound (WAV) n/a

n/a

n/a

MIDI (MID)

n/a

n/a

n/a

MPEG 1 Audio Layer 3 (MP3) n/a

n/a

n/a

MPEG 1 Video (MPG)

n/a

n/a

n/a

MPEG 2 Audio (MPEGA)

n/a

n/a

n/a

MPEG 4 Audio (MP4)

n/a

n/a

n/a

NeXT/Sun Audio (AU)

n/a

n/a

n/a

QuickTime Movie (QT/MOV)

n/a

n/a

n/a

Windows Video (AVI)

n/a

n/a

n/a

Presentations

Apple iWork Keynote (GZ)

Y

Y

N

Applix Presents (AG)

character set N

N

1252 only

Corel Presentations (SHW)

character set N

N

1252 only

Extensible Forms Description

Y

Y

N

Language (XFD)

Lotus Freelance Graphics 2

character set N

N

(PRE)

850 only

Lotus Freelance Graphics (PRZ) Y

Japanese, Simple Chinese, N Traditional Chinese, Thai only

IDOL KeyView (12.12)

Page 211 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte Multibyte

Bidirectional

Macromedia Flash (SWF)

Y

Y

N

Microsoft OneNote

Y

Y

N

Microsoft PowerPoint PC (PPT)
Microsoft PowerPoint Windows (PPT)

character set Traditional Chinese only 1252 only

Y

Japanese, Simple Chinese,

Traditional Chinese,

Korean only

N Hebrew only

Microsoft PowerPoint Macintosh Y

N

N

(PPT)

Microsoft PowerPoint Windows Y

Y

Y

XML 2007 and 2010 (PPTX)

OASIS Open Document (ODP) Y

Y

N

OpenOffice Impress (ODP)

Y

Y

N

StarOffice Impress (ODP)

Y

Y

N

Spreadsheets

Apple iWork Numbers (GZ)

Y

Y

N

Applix Spreadsheets (AS)

character set N

N

1252 only

Comma Separated Values (CSV) character set N

N

1252 only

Corel Quattro Pro (QPW/WB3) Y

N

N

Data Interchange Format (DIF) Y

Y

Y1

Lotus 1-2-3 (123)

Y

Y

Y

Lotus 1-2-3 (WK4)

Y

Y

N

Lotus 123 Charts (123)

Y

Y

N

Microsoft Excel Charts (XLS)

Y

Y

N

Microsoft Excel Macintosh (XLS) Y

N

N

Microsoft Excel Windows (XLS) Y

Y

Y2

Microsoft Excel Windows XML Y

Y

N

2007 (XLSX)

Microsoft Office Excel Binary

Y

Y

N

Format (XLSB)

IDOL KeyView (12.12)

Page 212 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte Multibyte

Microsoft Works Spreadsheet Y

N

(S30/S40)

OASIS Open Document (ODS) Y

Y

OpenOffice Calc (ODS)

Y

Y

StarOffice Calc (ODS)

Y

Y

Text and Markup

ANSI (TXT) ASCII (TXT)

Y

Y

Y

Y

HTML (HTM)

Y

Y

Microsoft Excel Windows XML Y

Y

2003

Microsoft Word for Windows XML Y

Y

2003

Microsoft Visio XML 2003

Y

Y

Rich Text Format (RTF)

Y

Y

Unicode HTML

Y

Y

Unicode Text (TXT) XHTML

Y

Y

Y

Y

XML Word Processing

Y

Y

Adobe Maker Interchange Format (MIF)

character set N 1252 only

Apple iChat Log (ICHAT)

Y

Y

Apple iWork Pages (GZ)

Y

Y

Applix Words (AW)

character set N 1252 only

DisplayWrite (IP) Folio Flat File (FFF)

character set N 500, 1026 only
character set N 1252 only

IDOL KeyView (12.12)

Bidirectional N N N N
Y2 Y2 Y2, 2 Y Y Y Y3 Y 2,3 Y2 Y3 Y
N N N N N
N
Page 213 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte Multibyte

Bidirectional

Founder Chinese E-paper Basic Y

Y

N

(CEB)

Fujitsu Oasys (OA2)

Y

Y

N

Hangul (HWP)

Y

Y

N

Health level7 (HL7)

Y

Y

Y

IBM DCA/RTF (DC)

character sets N

N

500, 1026

only

JustSystems Ichitaro (JTD)

Y

Y

N

Lotus AMI Pro (SAM)

Y

Simple Chinese, Traditional Y Chinese, Japanese, Thai only

Lotus AMI Professional Write

Y

Plus (AMI)

Lotus Word Pro (LWP)

Y

Simple Chinese, Traditional N Chinese, Japanese, Thai only

Y

Y3

Lotus SmartMaster (MWP)

Y

Y

N

Microsoft Word PC (DOC)

character set N

N

1252 only

Microsoft Word Windows V1-2 Y

N

(DOC)

Microsoft Word Windows V6, 7, Y

Y

8, 95 (DOC)

Microsoft Word Windows V97 Y

Y

through 2003 (DOC)

Microsoft Word Windows XML Y

Y

2007 and 2010 (DOCX)

Microsoft Word Macintosh (DOC) Y

N

N Hebrew only3 Y3 Y3 Y3

Microsoft Works (WPS)

Y

Japanese only

N

Microsoft Write (WRI)

Y

Japanese only

N

OASIS Open Document (ODT) Y

Y

N

Omni Outliner (OO3)

Y

Y

N

OpenOffice Writer (ODT)

Y

Y

N

IDOL KeyView (12.12)

Page 214 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Multibyte and bidirectional support, continued

Format

Single-byte Multibyte

Bidirectional

Open Publication Structure

Y

Y

Y

eBook (EPUB)

StarOffice Writer (ODT)

Y

Y

N

Skype Log (DBB)

Y

Y (null-terminated charsets) N

WordPad (RTF)

Y

Y

Y

WordPerfect Linux (WPS)

Y

N

N

WordPerfect Macintosh (WPS) Y

N

N

WordPerfect Windows (WO)

Y

N

N

XML Paper Specification (XPS) Y

Y

N

XYWrite Windows (XY4)

character set N

N

1252 only

Yahoo! Instant Messenger (DAT) Y

Y (null-terminated charsets) N

1The text direction in the output file might not be correct.
2In Export SDK, a bidirectional right-to-left (RTL) tag is extracted from this format and included in the direction element (<dir=RTL>) of the output.

Coded Character Sets

This section lists which character set you can use to specify the target character set. The coded character sets are enumerated in kvcharset.h and defined in the Filter class.

Code Character Sets

Coded Character Set

Description

Can be set as target charset?

KVCS_

Unknown character set

N

UNKNOWN

KVCS_SJIS

Japanese (uses multibyte encoding), cp932

Y

KVCS_GB

Simplified Chinese (China, Singapore, Malaysia) Y cp936

KVCS_BIG5

Traditional Chinese (Taiwan, Hong Kong,

Y

Macaw) cp950

IDOL KeyView (12.12)

Page 215 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Code Character Sets, continued

Coded Character Set

Description

KVCS_KSC

Korean, cp949

KVCS_1250

Windows Latin 2 (Central Europe)

KVCS_1251

Windows Cyrillic (Slavic)

KVCS_1252

Windows Latin 1 (ANSI)

KVCS_1253

Windows Greek

KVCS_1254

Windows Latin 5 (Turkish)

KVCS_1255

Windows Hebrew

KVCS_1256

Windows Arabic

KVCS_1257

Windows Baltic Rim

KVCS_1258

Windows Vietnamese

KVCS_8859_1

ISO 8859-1 Latin 1 (Western Europe, Latin America)

KVCS_8859_2 ISO 8859-2 Latin 2 (Central Eastern Europe)

KVCS_8859_3 ISO 8859-3 Latin 3 (S.E. Europe)

KVCS_8859_4 ISO 8859-4 Latin 4 (Scandinavia/Baltic)

KVCS_8859_5 ISO 8859-5 Latin/Cyrillic

KVCS_8859_6 ISO 8859-6 Latin/Arabic

KVCS_8859_7 ISO 8859-7 Latin/Greek

KVCS_8859_8 ISO 8859-8 Latin/Hebrew

KVCS_8859_9 ISO 8859-9 Latin/Turkish

KVCS_8859_14 ISO 8859-14

KVCS_8859_15 ISO 8859-15

KVCS_437

DOS Latin US

KVCS_737

DOS Greek

KVCS_775

DOS Baltic Rim

KVCS_850

DOS Latin 1

Can be set as target charset? Y Y Y Y Y Y Y Y Y Y Y
Y Y Y Y Y Y Y Y Y Y Y Y Y Y

IDOL KeyView (12.12)

Page 216 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Code Character Sets, continued

Coded Character Set

Description

Can be set as target charset?

KVCS_851

DOS Greek

Y

KVCS_852

DOS Latin 2

Y

KVCS_855

DOS Cyrillic

Y

KVCS_857

DOS Turkish

Y

KVCS_860

DOS Portuguese

Y

KVCS_861

DOS Icelandic

Y

KVCS_862

DOS Hebrew

Y

KVCS_863

DOS Canadian French

Y

KVCS_864

DOS Arabic

Y

KVCS_865

DOS Nordic

Y

KVCS_866

DOS Cyrillic Russian

Y

KVCS_869

DOS Greek 2

Y

KVCS_874

Thai

Y

KVCS_

PDF MAC DOC

N

PDFMACDOC

KVCS_

PDF WIN DOC

N

PDFWINDOC

KVCS_STDENC Adobe Standard Encoding

N

KVCS_PDFDOC Adobe standard PDF character set

N

KVCS_037

EBCDIC code page 037

Y

KVCS_1026

EBCDIC code page 1026

Y

KVCS_500

EBCDIC code page 500

Y

KVCS_875

EBCDIC code page 875

Y

KVCS_LMBCS

Lotus multibyte character set Group 1 and Group N 2

KVCS_UNICODE Unicode, UCS-2

Y

KVCS_UTF16

16-bit Unicode transformation format

Y

IDOL KeyView (12.12)

Page 217 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Code Character Sets, continued

Coded Character Set

Description

Can be set as target charset?

KVCS_UTF8

8-bit Unicode transformation format

Y

KVCS_UTF7

7-bit Unicode transformation format

Y

KVCS_2022_JP ISO 2022-JP, Japanese mail and news safe

N

encoding (JIS-7)

KVCS_2022_CN ISO 2022-CN, Chinese mail and news safe

N

encoding

KVCS_2022_KR ISO 2022-KR, Korean mail and news safe

N

encoding

KVCS_WP6X

Word Perfect 6.x and higher character mapping N

KVCS_10000

Western European (Macintosh)

Y

KVCS_KSC5601 Unified Hangul

Y

KVCS_GB2312 Simplified Chinese (China, Singapore, Hong

Y

Kong)

KVCS_GB12345 Traditional Chinese (China) - analogue of

Y

GB2312

KVCS_ CNS11643

Traditional Chinese - Taiwan. Supplement to Big5 Y

KVCS_JIS0201 Japanese - contains ASCII character set (JIS-

N

Roman)

KVCS_JIS0212 Japanese. Supplement to JIS0208.

Y

KVCS_EUC_JP Japanese Extended UNIX Code

Y

KVCS_EUC_GB Simplified Chinese Extended UNIX Code

Y

KVCS_EUC_

Traditional Chinese Extended UNIX Code

N

BIG5

KVCS_EUC_

Korean Extended UNIX Code

N

KSC

KVCS_424

EBCDIC Hebrew

N

KVCS_856

PC Hebrew (old)

N

KVCS_1006

IBM AIX Pakistan (Urdu)

N

KVCS_KOI8R

Cyrillic (Russian)

Y

IDOL KeyView (12.12)

Page 218 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Code Character Sets, continued

Coded Character Set

Description

KVCS_PDF_ JAPAN1

Adobe-Japan1-2 character collection

KVCS_PDF_ KOREA1

Adobe-Korea1-0 character collection

KVCS_PDF_GB1 Adobe-GB1-3 character collection

KVCS_PDF_ CNS1

Adobe-CNS1-2 character collection

KVCS_2022_JP_ ISO 2022-JP, Japanese mail and news safe

8

encoding (JIS8)

KVCS_720

Arabic DOS-720

KVCS_VISCII

Vietnamese VISCII

KVCS_8859_10 ISO 8859-10 (Latin 6 Nordic)

KVCS_8859_13 ISO 8859-13 (Latin 7 Baltic)

KVCS_57002

ISCII Devanagari (x-iscii-de)

KVCS_57003

ISCII Bengali (x-iscii-be)

KVCS_57004

ISCII Tamil (x-iscii-ta)

KVCS_57005

ISCII Telugu (x-iscii-te)

KVCS_57006

ISCII Assamese (x-iscii-as)

KVCS_57007

ISCII Oriya (x-iscii-or)

KVCS_57008

ISCII Kannada (x-iscii-ka)

KVCS_57009

ISCII Malayalam (x-iscii-ma)

KVCS_57010

ISCII Gujarathi (x-iscii-gu)

KVCS_57011

ISCII Panjabi (x-iscii-pa)

KVCS_ GB18030b2

Reserved for internal use

KVCS_GB18030 GB18030 (Chinese 4-byte character set)

KVCS_8859_11 ISO 8859-11 (Thai)

KVCS_8859_16 ISO 8859-16 (Latin-10 South-Eastern Europe)

Can be set as target charset? N
N
N N
N
Y Y Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 n/a
Y Y Y

IDOL KeyView (12.12)

Page 219 of 280

Filter SDK Java Programming Guide Appendix D: Character Sets

Code Character Sets, continued

Coded Character Set

Description

KVCS_ ARABICMAC

Arabic Mac (x-mac-arabic)

KVCS_KOI8U

Cyrillic (KOI8U Ukrainian)

KVCS_ HZGB2312

The 7-bit representation of GB 2312 / RFC 1842

KVCS_UTF32

32-bit Unicode transformation format

Can be set as target charset? Y
Y n/a
Y

1The character set cannot be forced as output in Export SDK and Viewing SDK because the character set is not supported by the major browsers.

IDOL KeyView (12.12)

Page 220 of 280

Appendix E: Extract and Format Lotus Notes Subfiles

This section describes how to create XML templates to alter the appearance of extracted Lotus mail note subfiles so that they maintain the look and feel of the original notes.

· Overview

221

· Customize XML Templates

221

· Template Elements and Attributes

223

· Date and Time Formats

228

Overview
KeyView uses the NSF reader, nsfsr, to extract Lotus database files, and places Lotus mail notes in subfiles. The NSF reader uses a set of default XML templates to extract the notes and apply formatting, thereby approximating the look and feel of the original notes.
In some cases, you might need to customize the XML templates, for instance if your notes contain custom data. In such cases, you can modify the existing XML templates or create your own.
During extraction, the NSF reader loads all XML files in the NSFtemplates directory and its subdirectories (except for the NSFtemplates\images directory, which is reserved for images). During initialization, the KeyView XML parser verifies the XML templates. If the templates contain any invalid XML, elements, or attributes, initialization fails and errors are recorded in the nsfsr.log file.

Customize XML Templates
XML templates are enabled by default. In most cases, the default templates should be sufficient; however, you can customize them or create your own as required.
To customize XML templates for Lotus note extraction 1. Modify the template files in the following directory. install\OS\bin\NSFtemplates The main.xml file must exist in the NSFtemplates directory. It is the top-level template file that extracts all subfiles, usually by calling other templates. 2. Make sure that any modifications or additional XML files conform to the supported elements and attributes described in Template Elements and Attributes, on page 223. 3. Extract the Lotus database file.

IDOL KeyView (12.12)

Page 221 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles
Use Demo Templates
For testing purposes, you can extract notes by using a set of demo templates, which are provided to demonstrate the proper usage of all the XML elements and attributes, because the default templates do not use all the XML elements. The demo templates are available at: install\OS\bin\NSFtemplates
To use the demo XML templates 1. In the formats.ini file, set the following parameter. [nsfsr] UseDemoTemplate=1 2. In the main.xml file, uncomment the following section. <ifini name="UseDemoTemplate" text="1"> <call file="demo.xml"/> <quit/> </ifini>
Use Old Templates
For testing purposes, you can extract notes by using legacy templates, which produce MHTML output. You can generate similar output by disabling the XML templates, but using the old templates enables you to see the XML code and compare it to the standard and demo templates.
To use the old XML templates 1. In the formats.ini file, set the following parameter. [nsfsr] UseOldTemplate=1 2. In the main.xml file, uncomment the following section. <ifini name="UseOldTemplate" text="1"> <call file="default_old.xml"> <quit> </ifini>
Disable XML Templates
For testing purposes, you can disable XML templates; KeyView extracts the notes in MHTML format. You can compare the MHTML output directly by the NSF reader with the MHTML output indirectly by the NSF reader through the XML templates.

IDOL KeyView (12.12)

Page 222 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

To disable XML templates 1. In the formats.ini file, set the following parameter.
[nsfsr] ExtractByTemplate=0

Template Elements and Attributes
This section lists the valid XML elements and attributes that you can use when creating or modifying templates. See the demo templates for examples.

Conditional Elements
The following table lists the valid conditional elements.

Conditional elements
Element <keyview> <if*>
<ifex>, <ifnx>
<ifeq>, <ifne>, <iflt>, <ifle>, <ifgt>, <ifge> <iftdeq>, <iftdne>, <iftdlt>, <iftdle>, <iftdgt>, <iftdge>

Description
The KeyView XML template container ("root") element
If the condition from the comparison is true, process the XML. Conditions can be nested up to 25 levels deep. Attributes
l name. (Required) The name of the main item to compare to item or text.
l item. (Required if no text) The name of the item to compare to the item specified by name.
l text. (Required if no item) The text to compare to the item specified by name.
If name item exists and has a text value or not. The Notes item might have a value that cannot be converted to text, such as an image.
Respectively, if text ==, !=, <, >, <=, >, >=. Text comparison uses a case-insensitive string compare.
Respectively, if time/date ==, !=, <, >, <=, >, >=. Time/date comparison converts dates to text in local time using the Notes default, TZFMT_NEVER, because Notes also sometimes converts fields to text internally. For example:

IDOL KeyView (12.12)

Page 223 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Conditional elements, continued

Element

Description

<iftzeq>, <iftzne>
<ifini> <else> <switch>

text="06/30/2005 02:52:04 PM"
Respectively, if the time zone equals or does not equal the comparison text, for example CDT, EST, and so on.
If the value of the INI option specified in name equals the text value.
If the condition from the last <if> or <switch> was false, process XML.
If a name value exists, process XML. Attributes
l name. (Required) The name of the main item to compare in <case> subelements.

<case>
<default> <for>

If the comparison condition is true, process XML, then stop processing the rest of <switch>.
Attributes
l text. (Required) The text to compare to the name item of <switch>.
If all <case> conditions were false, process XML. This element must be the last element in <switch>, after all the <case> elements. Any <case> elements after the <default> element are ignored.
If a name value exists, process XML. Process for each part of the name item.
Attributes
l name. (Required) The name of the main item.
l max. (Optional) The maximum index to process. By default, all are processed.

<index>

Output <for> loop index (1-based). <index> is only valid within a <for> element.

Control Elements
The following table lists the valid control elements.

IDOL KeyView (12.12)

Page 224 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Control Elements

Element

Description

<call> <log>
<quit> <stop>

Call another XML template. You can nest templates up to 10 levels deep. Attributes
l file. (Required) The template file name. This name must be unique. Log message to the NSF log file. Attributes
l text. (Required) The text to log. l type. (Optional) The type of log message. The following values are valid:
o ERROR o WARN o INFO o DIAG (the default option) o DEBUG o DUMP Stop processing the template. Exits without error. Attributes l text. (Optional) The text to log. l type. (Optional) The type of log message. See <log>, above. Stop processing the template. Exits with an ERROR log message. Attributes l text. (Required) The text to log.

Data Elements
The following table lists the valid data elements.

Data elements

Element Description

<text>

Output text. Attributes
l name. (Required if there is no parent) The name of the item to output.

IDOL KeyView (12.12)

Page 225 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Data elements, continued

Element Description

<rich> <body>

Output rich text (MHTML). Images are output in the next part or parts of the MHTML, after the first <HTML> part. Attributes
l name. (Required if there is no parent) The name of the item to output.
Output the message body in rich text (MHTML). As with <rich>, above, images are output in the next part or parts of the MHTML.

<form> <addr>

Output the message form (usually $Body field) in rich text (MHTML). Attributes
l name. (Required if there is no parent) The name of the item to output.
Output an address. Attributes
l name. (Required if there is no parent) The name of the item to output. l type. (Optional) The type of address to output. Set this attribute to CN (Common
Name), which is the only supported type.

<name>

Output the name of the last name item, or in other words the current main item. The item must exist.

<format> <date>

Set the default format for <date> and <date_kv>. This element does not set the <text> format. See Date and Time Formats, on page 228 for a list of all Notes and KeyView date and time formats and integer values. Attributes
l format. (Optional. Omit to reset to defaults) The Notes and KeyView date and time format. You can set the following formats: o TD=int. The Time Date format (TDFMT_*) o TS=int. The Time Show format (TSFMT_*) o TT=int. The Time Time format (TTFMT_*) o TZ=int. The Time Zone format (TZFMT_*) o KV=int. The KeyView date and time format
where int is an integer value that corresponds to the desired format. Separate multiple formats with commas. For example: format="TD=0,TS=2,TT=1,TZ=1,KV=55"
Output a Notes date.

IDOL KeyView (12.12)

Page 226 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Data elements, continued

Element Description

Attributes
l name. (Required if there is no parent) The name of the item to output.
l format. (Optional) See <format>, on the previous page. You can set the following values:

o TD

o TS

o TT

o TZ

<date_kv>

Output a KeyView date. Attributes
l name. (Required if there is no parent) The name of the item to output. l format. (Optional) See <format>, on the previous page. You can set the
following values:

o TZ

o KV

<time>

Output a time range, for example 1 hour, 30 minutes. Attributes
l name. (Required if there is no parent) The item name of the start date or time. l item. (Required) The item name of the end date or time.

<zone>

Output a Notes time zone mnemonic, for example MST. Attributes
l name. (Required if there is no parent) The name of date item to output.

<zone_ utc>

Output a time zone as UTC, for example (UTC-06:00).

<logo>

Output the mail header logo.
The image link is included in the output; the actual image is output to a different part of the MHTML subfile.

<image>

Output an image.
The image link is included in the output; the actual image is output to the MHTML next part, as with <rich>, on the previous page and <body>, on the previous page.

<image_ Output an image URI, in quotation marks. The actual image is output to a different

IDOL KeyView (12.12)

Page 227 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Data elements, continued

Element Description

uri>

part of the MHTML subfile. Attributes
l link. (Required if there is no file) The image link, such as a form or title name. For example:
l link="StdNotesLtr0" l file. (Required if there is no link) The name of the image file. The file must
exist in the ../../templates/images directory. For example:
l file="boxcheck.gif"

Date and Time Formats
This section lists the supported Notes and KeyView date and time formats for use with <format>, <date>, and <date_kv>.

Lotus Notes Date and Time Formats
This section lists supported Lotus Notes date and time formats, and the integer values that specify each one.

Lotus Notes date and time formats

Format

Integer Value

Description

TDFMT_FULL

0

(The Notes default) Year, month, and day

TDFMT_CPARTIAL 1

Month and day, year if not this year

TDFMT_PARTIAL 2

Month and day

TDFMT_DPARTIAL 3

Year and month

TDFMT_FULL4

4

Four-digit year, month, and day

TDFMT_

5

CPARTIAL4

TDFMT_

6

DPARTIAL4

Month and day, four-digit year if not this year Four-digit year and month

IDOL KeyView (12.12)

Page 228 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Lotus Notes date and time formats, continued

Format

Integer Value

Description

TTFMT_FULL

0

(Notes default) Hour, minute, and second

TTFMT_PARTIAL 1

Hour and minute

TTFMT_HOUR

2

Hour

TZFMT_NEVER

0

(Notes default) All time zones are converted to the current time zone

TZFMT_

1

SOMETIMES

TZFMT_ALWAYS 2

Show only when outside the current time zone Show for all time zones

TSFMT_DATE

0

Date

TSFMT_TIME

1

Time

TSFMT_DATETIME 2

(The Notes default) Date and time

TSFMT_

4

CDATETIME

Date and time, or time today or time yesterday

KeyView Date and Time Formats

This section lists KeyView date and time formats. The KeyView formats use the following syntax:

Month
Weekday Year >Day Time

Month = full month name Mon = abbreviated month name m = month (number) mm = two-digit month (leading 0) Weekday = full weekday name Wday = abbreviated weekday name yy = two-digit year yyyy = four-digit year d = day (number) dd = two-digit day (leading 0) h = 12-hour H = 24-hour

IDOL KeyView (12.12)

Page 229 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

Separators

m = minutes s = seconds P = AM/PM p = am/pm _ = space c = comma s = slash a = dash o = dot

KeyView date and time formats

Format

Output

12-Hour and 24-Hour Time Formats

KVDTF_P

P

KVDTF_P_hmm

P h:mm

KVDTF_hmm_P

h:mm P

KVDTF_P_hhmm

P hh:mm

KVDTF_hhmm_P

hh:mm P

KVDTF_P_hmmss

P h:mm:ss

KVDTF_hmmss_P

h:mm:ss P

KVDTF_P_hhmmss

P hh:mm:ss

KVDTF_hhmmss_P

hh:mm:ss P

KVDTF_Hmm

H:mm

KVDTF_HHmm

HH:mm

KVDTF_mmss

mm:ss

KVDTF_Hmmss

H:mm:ss

KVDTF_HHmmss

HH:mm:ss

Numerical Date Formats with Slashes

KVDTF_mmsdd

mm/dd

KVDTF_msdsyy

m/d/yy

IDOL KeyView (12.12)

Integer Value
1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16
Page 230 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

KeyView date and time formats, continued

Format

Output

KVDTF_mmsddsyy

mm/dd/yy

KVDTF_mmsddsyyyy

mm/dd/yyyy

KVDTF_ddsmm

dd/mm

KVDTF_ddsmmsyy

dd/mm/yy

KVDTF_ddsmmsyy_Hmm

dd/mm/yy H:mm

KVDTF_ddsmm_P_hmm

dd/mm P h:mm

KVDTF_ddsmm_hmm_P

dd/mm h:mm P

KVDTF_ddsmm_P_hhmm

dd/mm P hh:mm

KVDTF_ddsmm_hhmm_P

dd/mm hh:mm P

KVDTF_ddsmmsyy_P_hmm

dd/mm/yy P h:mm

KVDTF_ddsmmsyy_hmm_P

dd/mm/yy h:mm P

KVDTF_ddsmmsyy_P_hmmss

dd/mm/yy P h:mm:ss

KVDTF_ddsmmsyy_hmmss_P

dd/mm/yy h:mm:ss P

KVDTF_ddsmmsyy_P_hhmmss

dd/mm/yy P hh:mm:ss

KVDTF_ddsmmsyy_hhmmss_P

dd/mm/yy hh:mm:ss P

KVDTF_yysmmsdd_P_hhmmss

yy/mm/dd P hh:mm:ss

KVDTF_yysmmsdd_hhmmss_P

yy/mm/dd hh:mm:ss P

KVDTF_msdsyy_Hmm

m/d/yy H:mm

KVDTF_mmsddsyy_Hmm

mm/dd/yy H:mm

KVDTF_msdsyy_P_hmm

m/d/yy P h:mm

KVDTF_msdsyy_hmm_P

m/d/yy h:mm P

KVDTF_mmsddsyy_hmm_P

mm/dd/yy h:mm P

KVDTF_mmsdd_P_hhmm

mm/dd P hh:mm

KVDTF_mmsdd_hhmm_P

mm/dd hh:mm P

KVDTF_mmsddsyy_P_hhmmss

mm/dd/yy P hh:mm:ss

KVDTF_mmsddsyy_hhmmss_P

mm/dd/yy hh:mm:ss P

Integer Value 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

IDOL KeyView (12.12)

Page 231 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

KeyView date and time formats, continued

Format

Output

Integer Value

KVDTF_msd

m/d

43

KVDTF_yysm

yy/m

44

KVDTF_yysmm

yy/mm

45

KVDTF_yysmsd

yy/m/d

46

KVDTF_yysmmsdd

yy/mm/dd

47

KVDTF_yyyysmmsdd

yyyy/mm/dd

48

Numerical Date Formats with Dashes

KVDTF_ddammayy

dd-mm-yy

49

KVDTF_mmadd

mm-dd

50

KVDTF_mmayy

mm-yy

51

KVDTF_yyammadd

yy-mm-dd

52

KVDTF_yyyyammadd

yyyy-mm-dd

53

KVDTF_yyyyammaddaHHmmss

yyyy-mm-dd-HH:mm:ss

54

Numerical Date Formats with Dots

KVDTF_yyomod

yy.m.d

55

KVDTF_yyommodd

yy.mm.dd

56

KVDTF_mod

m.d

57

KVDTF_mmodd

mm.dd

58

Numerical and String Date Formats with Dashes, Commas, and Spaces

KVDTF_ddaMon

dd-Mon

59

KVDTF_daMonayy

d-Mon-yy

60

KVDTF_ddaMonayy

dd-Mon-yy

61

KVDTF_ddaMonayyyy

dd-Mon-yyyy

62

KVDTF_Mon

Mon

63

KVDTF_Monayy

Mon-yy

64

KVDTF_Monayyyy

Mon-yyyy

65

IDOL KeyView (12.12)

Page 232 of 280

Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles

KeyView date and time formats, continued

Format

Output

Integer Value

KVDTF_Monaddayy

Mon-dd-yy

66

KVDTF_yyammadd_P_hhmmss

yy-mm-dd P hh:mm:ss

67

KVDTF_mmadd_P_hhmm

mm-dd P hh:mm

68

KVDTF_Mon_yy

Mon yy

69

KVDTF_Monc_yy

Mon, yy

70

KVDTF_Month

Month

71

KVDTF_Monthayy

Month-yy

72

KVDTF_Month_yy

Month yy

73

KVDTF_Monthc_yy

Month, yy

74

KVDTF_Monthayyyy

Month-yyyy

75

KVDTF_Month_yyyy

Month yyyy

76

KVDTF_Monthc_yyyy

Month, yyyy

77

KVDTF_Mon_dc_yyyy

Mon d, yyyy

78

KVDTF_d_Monc_yyyy

d Mon, yyyy

79

KVDTF_yyyy_Mon_d

yyyy Mon d

80

KVDTF_Month_dc_yyyy

Month d, yyyy

81

KVDTF_d_Monthc_yyyy

d Month, yyyy

82

KVDTF_yyyy_Month_d

yyyy Month d

83

Weekday Date Formats

KVDTF_Wday

Wday

84

KVDTF_Weekday

Weekday

85

KVDTF_Wdayc_Mon_dc_yyyy

Wday, Mon d, yyyy

86

KVDTF_Weekdayc_Month_dc_yyyy Weekday, Month d, yyyy 87

KVDTF_Weekdayc_d_Monthc_yyyy Weekday, d Month, yyyy 88

IDOL KeyView (12.12)

Page 233 of 280

Appendix F: File Format Detection

This section describes how file formats are detected in Filter SDK.

· Introduction

234

· Extract Format Information

234

· Determine Format Support

234

· Translate Format Information

237

· Determine a Document Reader

238

· Additional Format Information

238

Introduction
The KeyView format detection module (kwad) detects a file's format, and reports the information to the API, which in turn reports the information to the developer's application. If the detected format is supported by the KeyView SDK, the detection module also loads the appropriate structured access layer and document reader for further processing. For a list of supported formats, see Document Readers, on page 173.

Extract Format Information
You can extract format information from a document by using one of the getDocFormatInfo methods. These methods extract the major format, file class, version, and document attributes, and populate the DocFormatInfo class. They return the format information as a string. The format information that you can extract is listed in the header file adinfo.h.
For information on how to translate the extracted format information, see Translate Format Information, on page 237.

Determine Format Support
After the file format is extracted, the detection module uses the formats.ini file to determine whether the format is supported by KeyView, and the appropriate structured access layer and reader to load.

IDOL KeyView (12.12)

Page 234 of 280

Filter SDK Java Programming Guide Appendix F: File Format Detection
The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. It contains the following information:
l Coded format information. To translate this information, see Translate Format Information, on page 237.
l The reader associated with each format. See Determine a Document Reader, on page 238. l Configuration parameters. l Locale settings for internal use.
Example formats.ini file entries
123=mw 152=xyw 178=wp6 189=mw6 2=af 200=pdf 205=mb 210=htm 251=htm
NOTE: The formats.ini file applies to all formats except graphics. Detection of graphics formats is handled by an internal module named KeyView Picture Interchange Format (KPIF).
Refine Detection of Text Files
During text detection, KeyView analyses the first 1 kB and last 1 kB of data in a document. If less than 10% of that data consists of non-ASCII characters, KeyView detects the document as a text file. However, depending on the type of documents you are working with, the default settings might not provide the desired level of accuracy. Configuration flags enable you to change the amount of data to read at the end of a file, the percentage of non-ASCII characters permitted in a text file, and whether to use or ignore the file extension to determine the document format.
Change the Amount of File Data to Read
During file detection, KeyView reads characters from the beginning and end of a file--by default, it reads the first and last 1,024 bytes of data. Large text files might contain many irrelevant characters at the end of a file, so KeyView might not accurately detect the file format. You can set a configuration flag to increase the amount of data to read from the end of a file during detection.

IDOL KeyView (12.12)

Page 235 of 280

Filter SDK Java Programming Guide Appendix F: File Format Detection
To change the amount of data to read during detection l In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] non_ascii_chars_end_block_size=kB where kB is the number of kilobytes to read from the end of the file, from 0 to 10. The default value is 1.
NOTE: The file size must be greater than the value specified in the flag. If the flag value is greater than the file size, KeyView does not use the flag.
Change the Percentage of Allowed Non-ASCII Characters
By default, if less than 10% of the analyzed data in a document consists of non-ASCII characters, it is detected as a text file. Depending on the type of files that you are working with, changing the default percentage might increase detection accuracy.
To change the percentage of non-ASCII characters allowed in text files l In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] non_ascii_chars_in_text=N where N is the percentage of non-ASCII characters to allow in text files. Files that contain a lower percentage of non-ASCII characters than N are detected as text files. The default value is 10.
Allow Consecutive NULL Bytes in a Text File
By default, if a document contains consecutive NULL bytes, it is not detected as text. Depending on the type of files that you are working with, changing the default might increase detection accuracy.
To allow consecutive NULL bytes of ASCII characters in text files In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] ascii_allow_null_bytes=1 The default value is 0 (do not allow consecutive NULL bytes).
Use the File Extension for Detection
Sometimes KeyView detects certain file formats, such as CSV, as ASCII because of the content of the documents. In such cases, you can configure KeyView to use the file extension to determine the document format. Using the file extension can improve detection of formats such as CSV, but might not detect text files successfully if they have incorrect file extensions.

IDOL KeyView (12.12)

Page 236 of 280

Filter SDK Java Programming Guide Appendix F: File Format Detection
To use the file extension for ASCII files during detection l In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] use_extension_for_ascii=1 The default is 0 (do not use the file extension).
Translate Format Information
Format information can include file attributes in the following categories: l Major format l File class l Minor format l Major version l Minor version
Not all categories are required. Many formats only include major format and file class, or major format only. The format information has the following structure: MajorFormat.FileClass.MinorFormat.MajorVersion.MinorVersion For example: 81.2.0.9.0 Each number in the format information represents a file attribute. The entry 81.2.0.9.0 represents a Lotus 1-2-3 Spreadsheet file version 9.0, where 81= Lotus 1-2-3 Spreadsheet (major format)
2 = Spreadsheet (file class) 0 = not defined (minor format) 9 = 9 (major version) 0 = 0 (minor version) This example applies to the formats.ini file. When extracting format information using the getDocFormatInfo methods, the same format is represented as 294.2.9.0.
NOTE: The format values returned from getDocFormatInfo differ from those in formats.ini because the former defines a unique ID for each major format, while the latter uses a major version, minor version, and minor format to distinguish between formats.

IDOL KeyView (12.12)

Page 237 of 280

Filter SDK Java Programming Guide Appendix F: File Format Detection
Distinguish Between Formats
The DocFormatInfo class provides a unique ID for each major format. For example, a call to getDocFormatInfo would return 351.1.0 for a Microsoft Word XML format. The major format 351 is unique to this format. Unlike DocFormatInfo, the formats.ini file distinguishes between formats by using the major version number. For example, in the formats.ini file, a Microsoft Word 2003 XML format is defined as 285.1.0.100.0. The major format 285 and file class 1 are the same values for generic XML. The major version 100 distinguishes the format as Microsoft Word 2003 XML. The major version is used to specify the following formats:
l Microsoft Office 2003 XML. This format has the same major format and file class as generic XML (285.1). It is distinguished from generic XML by using the following major versions: o Word: 100 o Excel: 101 o Visio: 110
l The XHTML format has the same major format and file class as HTML (210.1). It is distinguished from HTML by using the major version 100.
Determine a Document Reader
The format detection module uses the formats.ini file to determine whether a format is supported, and to determine the reader to use to parse a format. The entries in the formats.ini file list each format's coded value, and an abbreviation for the format's reader. The reader abbreviation is a truncated version of the reader's library name. Adding "sr" to the end of an abbreviation creates the name of the reader. For example, this example entry specifies that a Lotus 1-2-3 Spreadsheet file version 9.0 is parsed by the Lotus 1-2-3 filter, l123sr: 81.2.0.9.0=l123 List of Required Files for Redistribution, on page 239 lists the readers provided with KeyView.
Additional Format Information
The ADDOCINFO class returns basic information about a document's format, but sometimes it can be useful to have additional information. The file format_descriptions.tsv, which can be found in the bin directory, provides a mapping between file format ID, human-readable format description, and the format's MIME type (if one exists). This file is in tab-delimited format, and the tab character will only appear as a delimiter. This information is available in the documentation (see the section Supported Formats, on page 105), but the TSV file provides it in a machine-readable format.

IDOL KeyView (12.12)

Page 238 of 280

Appendix G: List of Required Files for Redistribution
This section lists the Filter files that can be redistributed in your applications under the licensing agreement. Unless noted, these files are in the directory install\OS\bin, where install is the path of the Filter installation directory and OS is the operating system platform.
NOTE: On Windows systems, the libraries are .dll files. On UNIX systems, the libraries are .so, .a, or .sl files.

Core Files

The following core files can be redistributed with your application.

File formats.ini
FilterDotNet.dll filterfordotnet.dll KeyView.jar

Description Initialization file. For more information on this file, see Determine Format Support, on page 234. The .NET API. Required by the .NET API. The Java API.

NOTE: This file can be found at the path install/javaapi/KeyView.jar where install is the Filter SDK installation directory.

*KeyViewFilter.* kpifcnvt.* kpifutil.*
kvfilter_nsl.a
kvxtract.* kvfilter.* kvolefio.*

Required by the Java API.
For presentation graphics, converts from one picture format to another.
Utility for handling the internal picture interchange format for presentation graphics.
(AIX platforms only.) Alternative Filter API implementation using POSIX standards for starting new processes. See The Filter Process Model, on page 25.
File Extraction API.
Filter API.
Embedded OLE object writer.

IDOL KeyView (12.12)

Page 239 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File kvutil.* kvxpgsa.*
kvxsssa.* kvxwpsa.* kvzip.* kwad.* txtcnv.* vcredist\*

Description
Internal KeyView utility functions.
Interface between presentation readers and kvfilter. Required to extract metadata from AutoCAD files.
Interface between spreadsheet readers and kvfilter.
Interface between word processing readers and kvfilter.
Zip writer.
File auto-recognition module.
Converter for document token stream.
(Windows platforms only) Microsoft Visual C++ Redistributable Packages. For more information about these files, see Software Dependencies, on page 14.
NOTE: The vcredist folder is located at the root of the SDK, and not in the bin directory.

Support Files

The following support files can be redistributed with your application.

File

Description

datafiles\* NSFtemplates\* 7z.* bentofio.* cbmap.map CEBDLL.dll chartbls.ux chmdll.* codeidentifierplugin.* cpstsdk.* DFECore.dll

(Folder) Required by kvlangdetect (Folder) Templates used by nsfsr to format Lotus mail notes Required by z7zsr and multiarcsr Required by l123sr and kpprzrdr. Character mappings for Adobe Portable Document Format (PDF). Required by cebsr. Character mappings. Required by chmsr. Required for source code identification Required by pstxsr. Required by cebsr.

IDOL KeyView (12.12)

Page 240 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File Filter.dll kpbmpwrt.* kppng.* kvdecrypt.* kvlangdetect.* kvxconfig.ini kvoop.* kvthread.* kv.lic
*langdetectext.* libpff.* libcrypto* libstlport.so.1
tabledata.dat unzipjpg.* wpmap.*
xmlsh.*

Description Required by cebsr. Required for processing bmp files. Required for ZLIB decompression. Decryption utility functions. Utility functions for language and character set detection. Contains element extraction settings for XML files. Required for out-of-process filtering. Required for multithreaded out-of-process filtering. Contains license information for KeyView products. This file is opened and validated when a KeyView API is used. Required by kvlangdetect.*. Required by pffsr. SSL utility functions used by KeyView mail format readers. (Solaris platforms only) Solaris Studio Redistributable. This file is located in install/OS/lib. Required for table detection. Required for JPEG decompression. Extended character mapping for WordPerfect and Corel Presentation. Contains a library of content handlers for each XML file type. Required by the Expat XML parser.

Document Readers

The following readers can be redistributed with your application.

File ad1sr.* afsr.* aiffsr.*

Description AD1 Evidence file reader ASCII reader Audio Interchange Format File (AIFF) reader

IDOL KeyView (12.12)

Page 241 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File asfsr.* assr.* awsr.* b1sr.* bkfsr.* bmpsr.* bzip2sr.* cabsr.* cebsr.* chmsr.* csvsr.* dbfsr.* dbxsr.* dcasr.* dcmsr.* difsr.* dmgsr.* dw4sr.* dxlsr.* emlsr.*
emxsr.* encasesr.* encase2sr.* entsr.* epubsr.* foliosr.* gdsiisr.*

Description Advanced Systems Format reader Applix Spreadsheet reader Applix Word reader B1 archive reader Microsoft Backup File reader Windows bitmap (BMP) reader Bzip2 reader Microsoft Cabinet format reader Founder Chinese E-paper Basic reader Microsoft Compiled HTML Help reader Comma-Separated Values reader dBase Database reader Microsoft Outlook Express DBX reader Document Content Architecture/Revisable Form Text (DCA/RFT) reader Digital Imaging and Communications in Medicine (DICOM) reader Data Interchange Format reader Mac Disk Copy Disk Image File reader DisplayWrite reader Domino XML Language reader Microsoft Outlook Express (EML) reader. This is used to filter EML files when the MBX reader is not licensed. Legato EMailXtender (EMX) reader Expert Witness Compression Format (EnCase) v6 reader Expert Witness Compression Format (EnCase) v7 reader Microsoft Entourage Database Format reader Open Publication Structure eBook reader Folio Flat File reader Graphic Database System (GDSII) reader

IDOL KeyView (12.12)

Page 242 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File gifsr.* gwfssr.* hl7sr.* htmsr.* hwpsr.* hwposr.* ichatsr.* icssr.* isosr.* iwss13sr.* iwwp13sr.* iwwpsr.* iwsssr.* jp2000sr.* jpgsr.* jtdsr.* kpagrdr.* kpcatrdr.* kpcgmrdr.* kpdwgrdr.* kpdxfrdr.* kpemfrdr.* kpgflrdr.* kpgifrdr.* kpiwpg13rdr.* kpiwpgrdr.* kpjbig2rdr.*

Description Graphics Interchange Format (GIF) reader GroupWise FileSurf reader Health level7 reader (metadata only) HTML and XHTML reader Hangul 97 reader Hangul 2002, 2005, 2007 reader Apple iChat Log reader Microsoft Outlook iCalendar reader ISO-9660 CD Disc Image Format reader iWork 13 Numbers reader iWork 13 Pages reader Apple iWork Pages reader Apple iWork Numbers reader JPEG 2000 metadata reader JPEG metadata reader JustSystems Ichitaro reader Applix Presentations reader CATIA format reader Computer Graphics Metafile reader AutoCAD Drawing format reader AutoCAD Drawing Exchange format reader Enhanced Metafile reader Omni Graffle reader Graphic Interchange Format (GIF) reader iWork 13 keynote reader Apple iWork Keynote reader JBIG2 reader

IDOL KeyView (12.12)

Page 243 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File kpjp2000rdr.* kpmsordr.* kpnbmprdr.* kpodardr.* kpodfrdr.* kpoxdrdr.* kpp40rdr.* kpp95rdr.* kpp97rdr.* kppctrdr.* kppicrdr.* kppngwrt.* kpppxrdr.* kpprerdr.* kpprzrdr.* kpsddrdr.* kpsdwrdr.* kpshwrdr.* kptifrdr.* kpugrdr.* kpvsd2rdr.* kpvsdxrdr.* kpwg2rdr.* kpwmfrdr.* kpwpgrdr.* kpxfdlrdr.* kvgzsr.* kvhqxsr.*

Description JPEG 2000 reader Microsoft Office Drawing Objects (office 97, 2000, and XP) reader Notes Bitmap reader (for embedded images in DXL files) AutoCAD reader Oasis Open Document Format presentation (ODP) reader Open Office XML Diagram Graphics reader. Microsoft PowerPoint PC 4.0 and PowerPoint Mac reader Microsoft PowerPoint 95 reader Microsoft PowerPoint 97 and higher reader Macintosh Quick Draw Picture (PICT) reader Pictor PC Paint (PIC) reader Portable Network Graphics (PNG) reader Microsoft PowerPoint XML reader 2007 Lotus Freelance Graphics for Windows V2.0 reader Lotus Freelance Graphics 96/97/98 reader StarOffice Impress reader Lotus Ami Pro Graphics reader Corel Presentations reader Tagged Image File (TIF) reader Unigraphics (UG) NX reader Microsoft Visio reader Microsoft Visio 2013 reader WordPerfect Graphics 2 reader Windows Metafile reader WordPerfect Graphics 1 reader Extensible Forms Description Language reader GZIP reader BinHex reader

IDOL KeyView (12.12)

Page 244 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File kvzeesr.* l123sr.* lasr.* ltbenn30.dll ltscsn10.dll lwpapin.dll lwppann.dll lwpsr.dll lzhsr.* macbinsr.* mbsr.* mbxsr.* mdbsr.* mhtsr.* mifsr.* misr.* mp3sr.* mpeg4sr.* mppsr.* msgsr.* mspubsr.* msw6sr.* mswsr.* multiarcsr.* mw6sr.* mw8sr.*

Description UNIX Compress reader Lotus 123 v96/97/98 reader Lotus AMI Pro reader Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro reader (supported on Windows x86 platform only) Microsoft Compression Folder reader MacBinary reader Microsoft Word Macintosh reader Mailbox (MBX) and Microsoft Outlook Express (EML) reader1 Microsoft Access reader MIME HTML reader Adobe Maker Interchange reader Microsoft Word 2 reader MP3 reader for metadata extraction reader MPEG-4 Audio file reader Microsoft Project reader Microsoft Outlook (MSG) reader Microsoft Publisher reader Microsoft Works 6 and 2000 reader Microsoft Works V1 and 2 reader ARJ Reader Microsoft Word 95 reader Microsoft Word 97, 2000, and XP reader

1This reader is an advanced feature and is sold and licensed separately from KeyView Filter SDK. See License Information, on page 17

IDOL KeyView (12.12)

Page 245 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File mwsr.* mwssr.* mwxsr.* nsfsr.* oa2sr.* odfsssr.* odfwpsr.* olesr.* olmsr.* onealtsr.* onesr.* pmesr.* onmsr.* oo3sr.* pbixsr.* pdf2sr.* pdfsr.* pfilesr.* pffsr.* pngsr.* psdsr.* pstsr.dll
pstnsr.* pstxsr.* qpssr.* qpwsr.* rarsr.*

Description Microsoft Word for DOS and Microsoft Write reader Microsoft Works Spreadsheet reader Microsoft Word 2007 XML reader Lotus Notes database reader 1 Fujitsu Oasys reader Oasis Open Document Format spreadsheets (ODS) reader Oasis Open Document Format word processing (ODS) reader Embedded OLE object reader Microsoft Outlook for Macintosh reader Microsoft OneNote Alternate Format reader Microsoft OneNote Format reader Plazmic Media Engine data file reader Legato EMailXtender Native Message reader Omni Outliner reader Microsoft Power BI file (PBIX) reader Alternative Adobe Portable Document Format file (PDF) reader Adobe Portable Document Format file (PDF) reader Microsoft Rights Management System encryption file reader Microsoft Outlook Offline Storage File reader Portable Network Graphics (PNG) reader Adobe Photoshop Document (PSD) reader Microsoft Outlook Personal Folders file MAPI-based reader (supported on Windows platform only)1 Microsoft Outlook Personal Folders file native reader1 Microsoft Outlook Personal Folders file native reader1 Corel Quattro Pro spreadsheet reader Corel Quattro Pro version X4 spreadsheet reader RAR Archive reader

IDOL KeyView (12.12)

Page 246 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File riffsr.* rtfsr.* skypesr.* sosr.* starcsr.* starwsr.* sunadsr.* swfsr.* tarsr.* tifsr.* tnefsr.* unihtmsr.* unisr.* unzip.* utf8sr.* uudsr.* vcfsr.* vsdsr.* wkssr.* wosr.* wp6sr.* wpmsr.* xlsbsr.* xlssr.* xlsxsr.* xmlsr.* xpssr.* xywsr.*

Description Microsoft WAVE reader Microsoft Rich Text reader Skype log file reader StarOffice/OpenOffice reader StarOffice Calc reader StarOffice Writer reader Sun Audio Data reader Macromedia Flash reader Tape archive reader TIFF reader (metadata only) Transfer Neutral Encapsulation Format Unicode HTML reader Unicode reader Zip file reader UTF-8 reader UUEncoding reader Microsoft Outlook vCard Contact reader Microsoft Visio reader Lotus 123 v2.0 through 5.0 reader WordPerfect 5.x reader WordPerfect 6.0 through 10.0 reader WordPerfect for Macintosh reader Microsoft Office 2007 Excel Binary Format reader Microsoft Excel reader Microsoft Excel 2007 XML reader Generic XML reader XML Paper Specification reader XYWrite reader

IDOL KeyView (12.12)

Page 247 of 280

Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution

File yimsr.* z7zsr.*

Description Yahoo! Instant Messenger reader 7-Zip reader

IDOL KeyView (12.12)

Page 248 of 280

Appendix H: Develop a Custom Reader

This section describes how to develop a reader for a format not supported by KeyView.

· Introduction

249

· How to Write a Custom Reader

250

· Development Tips

260

· Functions

261

Introduction
The Filter SDK enables you to write custom readers for formats not directly supported by KeyView. A reader is required to parse the file format and generate a KeyView token stream, which represents the content and format of the document. Filter can then use this token stream to generate a text version of the original document. The readers interact with a structured access layer and a writer to generate a text file in Filter, an HTML file in HTML Export, an XML file in XML Export, and a near-tooriginal view of the document in the Viewing SDK. The complexity of a custom reader depends on the file format used by the source document type. A simple reader extracts only the textual content, but ignores formatting and all other non-textual content. Readers of increasing complexity must address one or more of the following:
l formatting (including fonts, foreground and background colors, paragraph borders and shading, character and paragraph styles)
l tables
l lists
l headers
l footers
l footnotes
l endnotes
l graphics
l bookmarks to internal links
l hyperlinks to external documents or webpages
l other structures, such as a table of contents or index
Even a simple reader might have to parse the following components of a document: l word processing commands or tags
l encrypted or encoded text

IDOL KeyView (12.12)

Page 249 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader
l multiple character sets l text modified, but retained within the file l text displayed in an order other than its physical occurrence within the source file
It is very important to fully understand the file specification for the file format used by the document. This is essential in determining how to parse the source file and generate a token stream that accurately and effectively represents the original document. Within Filter, the custom reader must interact with a structured access layer and the format detection API, which in turn interacts with the top-level API. For a description of the Filter architecture, see Architectural Overview, on page 20. The custom reader must have a module definition file (*.def) that defines the exported API function calls. In addition, the formats.ini file must be modified to identify the custom reader and its associated format detection function. See the source code for the sample custom reader (utf8sr), which parses plain text files encoded in UTF-8. The source code is in the directory install/samples/utf8sr, where install is the path name of the Filter installation directory.
How to Write a Custom Reader
Two include files define the requirements for a custom reader: kvcfsr.h and kvtoken.h. The definitions of the KeyView tokens are in kvtoken.h. For more information on tokens, see Token Buffer, on the next page. The file kvcfsr.h defines two structures: TPReaderInterface and adTPDocInfo. The TPReaderInterface structure defines the API functions implemented by the custom reader. For basic readers, only the first four functions must be implemented. These functions are called by the structured access layer to parse the source file and generate the token stream. All readers must be threadsafe. This means that global variables must not be used. To pass information between functions, it is necessary to define a "global" context structure that stores all information required throughout the life of the DLL. The initial parameter of all but one of the TPReaderInterface functions is a pointer to a global context structure defined for the custom reader. The adTPDocInfo structure defines the information required for the format detection API, which associates the custom reader with the required file format.
Naming Conventions
Use the following naming conventions for functions and files: l The initial letters of the custom reader file name should identify the file format being parsed. For example, pdf for Adobe PDF files, rtf for RTF files, and xls for Microsoft Excel files. In the examples in this appendix, this is represented by xxx. l The name of the shared library must end with the letters sr.

IDOL KeyView (12.12)

Page 250 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

l The name of the exported functions in the module definition file must be xxxGetReaderInterface and xxxsrAutoDet.
NOTE: The letters sr are excluded from xxxGetReaderInterface, but are included in xxxsrAutoDet.

Basic Steps
The basic steps for developing a custom reader are as follows.
To develop a custom reader 1. Design the global context structure. 2. Write the basic API functions: l xxxAllocateContext() l xxxInitDoc() l xxxFillBuffer() l xxxFreeContext() l xxxCharSet() l xxxsrAutoDet() From within the xxxFillBuffer() function, it is necessary to call other functions that repeatedly read a chunk of a source file, parse the chunk, and generate a token stream until the entire source file is processed. 3. Map all but the last function to the TPReaderInterface structure. 4. Write the module definition file (*.def), exporting the reader interface and format detection functions. 5. Modify the formats.ini file to identify the custom reader and its associated format detection function. See xxxsrAutoDet(), on page 261. For example, the following lines would be added to the [Formats] section of the formats.ini file for the UTF-8 reader:
456.1.0.0=utf8 [CustomFilters] 1=utf8sr

Token Buffer

Filter technology parses the native file structure to generate an intermediate stream called a token buffer. The token buffer consists of multiple sequences of tokens, which are defined in kvtoken.h and listed below.

#define KVT_TEXT #define KVT_PARAINFO #define KVT_SETTABS

0x00 /* PutText() */ 0x01 /* SetParaInfo() */ 0x02 /* SetTabs() */

IDOL KeyView (12.12)

Page 251 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

#define KVT_TAB

0x03 /* Tab() */

#define KVT_MODE

0x04 /* SetMode() */

#define KVT_PARASPACE 0x05 /* SetParaSpace() */

#define KVT_ROWDEFN

0x06 /* DefineRow(), EndTable() */

#define KVT_COLUMNS

0x07 /* StartColumns(), etc. */

#define KVT_CELLSTART 0x08 /* NextCell() */

#define KVT_BITMAP

0x09 /* Reserved for annotations. */

#define KVT_PAGEOBJ

0x0A /* PutHeader(), PrintPage(), etc.*/

#define KVT_NOOP

0x0B /* Just skip a BYTE. */

#define KVT_PAGE_BREAK 0x0C /* PageBreak() */

#define KVT_PARA_BREAK 0x0D /* ParaEnd() */

#define KVT_LINE_BREAK 0x0E /* LineBreak() */

#define KVT_SET_FONT

0x0F /* SetFont() */

#define KVT_PAGE

0x10 /* SetPageInfo() */

#define KVT_HOTSPOT

0x11 /* StartHotSpot() */

#define KVT_LINESPACE 0x12 /* SetLineSpacing() */

#define KVT_COLOR

0x13 /* VESetTextColor(),VESetBkColor()*/

#define KVT_PICTURE

0x14 /* PutPicture() */

#define KVT_CELLMERGE 0x15 /* MergeCells() */

#define KVT_RULE

0x16 /* HorzRule() */

#define KVT_PATTERN

0x17 /* StartPattern(), etc. */

#define KVT_BORDER

0x18 /* StartParaBorder(), etc. */

#define KVT_HEADING

0x19 /* PutParaHeading() */

#define KVT_LISTING

0x1A /* StartList(), etc. */

#define KVT_CHARSET

0x1B /* SetCharSet() */

#define KVT_STYLE

0x1C /* PutCharStyle(), PutParaStyle()*/

#define KVT_BIDI

0x1D /* Set Bidirectional text */

#define KVT_LOCALE

0x1E /* Set locale of a document */

#define KVT_ZONE

0x1F /* StartZone(), EndZone() */

#define KVT_POSITION

0x20 /* SetPosition(), etc. */

#define KVT_AUTOREC

0x21 /* Reserved for Internal Use */

#define KVT_METADATA

0x22 /* Rsserved for Internal Use */

#define KVT_BYTEORDER 0x23 /* SetByteOrder() */

#define KVT_PARASPACEAUTO 0x24 /* SetParaSpaceAuto() */

#define KVT_ATTACH

0x25 /* PutAttachment() */

#define KVT_TOCPRINTIMAGE 0x26 /* StartTOCPrintImage(), etc. */

#define KVT_STREAM

0x27 /* PutStream(),Reserved */

#define KVT_REVISIONMARK 0x28 /* StartRevisionMark(),

EndRevisionMark(), SetRMAuthor(), SetRMDateTime() */

#define KVT_DOCXTRINFO 0x29 /* SetDocXtrInfo() */

#define KVT_PCTEMDFT

0x30 /* SetPctEmdFt() */

A token is a single-byte identifier that corresponds to attributes in a document. Each token has one or more associated macros that provide detailed information about an attribute. Many of these tokens define components of the document, such as page margins, line indentation, and foreground and background color. Collectively, these are referred to as the state of the document. This state changes as the document is parsed.

IDOL KeyView (12.12)

Page 252 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

Macros

Some of the macros are simple while others are complicated. An example of a simple macro is ParaEnd (pcBuf) which terminates the current paragraph.

#define ParaEnd(pcBuf)

\

{

\

*pcBuf++ = KVT_PARA_BREAK;

\

KVT_PUTINT(pcBuf, KVTSIZE_PARA_BREAK);

\

}

In Filter SDK, this generates an 0x0d, 0x0a pair of bytes on a Windows machine. In HTML Export this can generate a <p style="..."> element, depending on the value of other paragraph attributes.

One of the more complicated macros is PutPictureEx().

#define PutPictureEx(pcBuf, lpszKey, cx, cy, flags,

\

scaleHeight, scaleWidth,

\

cropFromL, cropFromT, cropFromR, cropFromB,

\

anchorHorizontal, anchorVertical, offsetX, offsetY)\

{

\

PutPic(pcBuf, lpszKey, cx, cy, flags,

\

scaleHeight, scaleWidth,

\

cropFromL, cropFromT, cropFromR, cropFromB,

\

anchorHorizontal, anchorVertical, offsetX, offsetY,\

180, 0, 180, 0, -1, 0, 0, 0, 0)

\

}

You can generate a representation of the token stream by running filtertest.exe with the -d command-line option. This stream does not include the tokens generated for headers or footers. The filtertest.exe is in the directory install\samples\utf8\bin, where install is the path name of the Filter installation directory.

Reader Interface
All custom readers use the reader interface defined in kvcfsr.h. The members of this structure are:
fpAllocateContext() fpInitDoc() fpFillBuffer() fpFreeContext() fpHotSpothit() fpGetSummaryInfo() fpOpenStream() fpCloseStream() fpGetURL() fpGetCharSet()
NOTE: fpHotSpothit() and fpGetURL() are currently reserved and must be NULL.

IDOL KeyView (12.12)

Page 253 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader
Function Flow
The structured access layer calls the functions as follows:
1. fpAllocateContext() is called and returns a pointer to the global context structure.
2. After further processing within the structured access layer, fpInitDoc() is called. This function performs all required initialization for the global context structure and then returns control to the structured access layer.
3. After further processing within the structured access layer, the fpFillBuffer() function is called repeatedly until the document is completely parsed.
4. Finally, fpFreeContext() is called. This function frees all memory allocated within the custom reader and then returns control to the structured access layer.
Related Topics
l Functions, on page 261
Example Development of fffFillBuffer()
The following is an example of how the fpFillBuffer() function in foliosr could be developed. The example demonstrates how the code changes as limitations of the implementation are identified. With each implementation, code revisions are shown in bold.
Implementation 1--fpFillBuffer() Function
/***************************************************************** *Function: fffFillBuffer() *Summary: Read fff input from stream and parse into kvtoken.h codes *****************************************************************/ int pascal _export fffFillBuffer(
void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { BOOL bRetVal; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; fffReadSourceFile(pContext); bRetVal = fffProcessBuffer(pContext, pcBuf); *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); }
The parameters in fffFillBuffer() are as follows:

IDOL KeyView (12.12)

Page 254 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

Parameter pCFContext pcBuf pnBufOut pnPercentDone cbBufOutMax

In/Out In In/Out Out Out In

Description A pointer to the context structure of the custom reader. A pointer to the token output buffer. A pointer to the number of bytes written to the output buffer. A pointer to the percentage complete. The maximum number of bytes that the token output buffer can hold.

Structure of Implementation 1
1. The local variable pContext is set to the address of the pCFContext void pointer, cast to a pointer to the global context structure for the reader. This provides access to all members of this structure.
2. After setting the pContext variable, a call is made to read the source file.
3. Next, a call is made to fffProcessBuffer(). The second parameter in the call is a pointer to the token output buffer. If this call fails, usually because of memory allocation errors, it returns FALSE.
4. The percentage complete is calculated.
5. The number of BYTES written to the token output buffer is calculated. This is based on the value of pContext->pcBufOut, which is increased each time a token is written to the buffer.
6. The function returns to the structured access layer.
7. Subsequent calls to fffFillBuffer() are made by the structured access layer until the percentage complete is 100.
Problems with Implementation 1
l There is a limit to the size of the token output buffer, typically 4 KB. If fffProcessBuffer() generates a token stream larger than this, there is a memory overflow. If fffProcessBuffer() generates a small token stream and the entire file has not been read, the output token buffer is underutilized.
l It might not be possible to process the entire input buffer from the source file because of boundary conditions. An example of a "boundary condition" is when the input buffer terminates part way through a control sequence in the original document. Another file read operation is required before the complete control sequence can be parsed.
l This function might be interrupted by other calls from the structured access layer to process headers, footers, footnotes, and endnotes, or to retrieve the document summary information. This can cause values of variables in the global context to change, and the source file to be repositioned.

IDOL KeyView (12.12)

Page 255 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

Implementation 2--Processing a Large Token Stream

Implementation 2 addresses the problem of processing a token stream that is larger than the output buffer size limit.

/*****************************************************************

* Function: fffFillBuffer()

* Summary: Read fff input from stream and parse into kvtoken.h codes

*****************************************************************/

int pascal _export fffFillBuffer(

void *pCFContext,

BYTE *pcBuf,

UINT *pnBufOut,

int *pnPercentDone,

UINT cbBufOutMax )

{

BOOL bRetVal = TRUE;

TPfffGlobals *pContext = (TPfffGlobals *)pCFContext;

pContext->pcBufOut

= pcBuf;

pContext->cbBufOutMax = 9 * cbBufOutMax / 10; /* Process the portion of the

fff file that is in the input buffer but do * not return from the fffFillBuffer()

function unless the output buffer is * at least 90% full. If any of the memory

allocations fail during the * execution of fffProcessBuffer(), bRetVal will be set

to FALSE, resulting * in this conversion failing "gracefully".

*/

do

{ if( pContext->bBufOutFull )

{ pContext->bBufOutFull = FALSE;
} else

{ fffReadSourceFile(pContext);
} bRetVal = fffProcessBuffer(pContext, pcBuf); *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); }while( bRetVal && !pContext->bBufOutFull && *pnPercentDone < 100 ); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); }

Structure of Implementation 2
1. cbBufOutMax is used to set pContext->cbBufOutMax. This is used in fffProcessBuffer() to monitor how full the token output buffer becomes as the source file is processed.

IDOL KeyView (12.12)

Page 256 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader
2. When the source file input buffer has been processed, fffProcessBuffer() returns, and the percentage complete is calculated.
3. If the token output buffer is not filled to a value greater than pContext->cbBufOutMax, pContext->bBufOutFull remains set to FALSE, and if the percentage complete is less than 100, the do-while loop is re-entered without returning from this function to the structured access layer. There is another call to fffReadSourceFile(), followed by fffProcessBuffer().
4. When the token output buffer is filled to a value greater than pContext->cbBufOutMax, pContext->bBufOutFull is set to TRUE. In this case, the do-while loop ends, the number of bytes written to the token output buffer is calculated, and control returns to the structured access layer.
5. The structured access layer continues to make calls to fffFillBuffer() until the entire source file is processed.
6. Each time the structured access layer calls fffFillBuffer(), another empty token output buffer is provided for the custom reader to use.
7. If the previous call to fffFillBuffer() exited because the previous token output buffer exceeded allowable capacity, pContext->bBufOutFull is reset to FALSE and no call is made to read the next buffer from the input source file.
Problems with Implementation 2
l It might not be possible to process the entire input buffer from the source file because of boundary conditions.
l This function might be interrupted by other calls from the structured access layer to process headers, footers, footnotes, or endnotes, or to retrieve the document summary information. This can cause values of variables in the global context to change, and the source file to be repositioned.
Boundary Conditions
A boundary condition can result from many situations arising from input file processing. For example, the input buffer might end with an incomplete command. In Folio flat files, this could be an incomplete element. In other word processing documents, a boundary condition might result from an incomplete control sequence, a split double-byte character, or a partial UTF-7 or UTF-8 sequence. These can be handled jointly by fffProcessBuffer(), which must detect the boundary condition, and fffReadSourceFile().
The following example shows partial code used in fffReadSourceFile():
/**************************************************************** * * Function: fffReadSourceFile() * ***************************************************************/
int pascal fffReadSourceFile(TPfffGlobals *pContext) {
int nBytes; /* Transfer remaining data to beginning of buffer prior to next read */

IDOL KeyView (12.12)

Page 257 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

if( pContext->nResidualBytes ) {
memcpy(pContext->cInputBuf, pContext->pcBufIn, pContext->nResidualBytes); } /* Read from file, without over-writing any text from the previous buffer */ nBytes = (*pContext->pIO->kwReadFunc)(pContext->pIO,
pContext->cInputBuf + pContext->nResidualBytes, BUFFERSIZE - pContext->nResidualBytes); /* Update input buffer control parameters */ pContext->unTotalBytesRead += (UINT)nBytes; pContext->pcBufIn = pContext->cInputBuf; pContext->pcBufInMax = pContext->pcBufIn + pContext->nResidualBytes + nBytes; pContext->nResidualBytes = 0; return nBytes; }
If fffProcessBuffer() is unable to process the entire input source file buffer, it sets the value for pContext->nResidualBytes. When the next call to fffReadSourceFile() is made, any residual bytes are copied to the beginning of the input source file buffer, and the number of bytes to be read is reduced to make sure that this buffer does not overflow.
A good way to test the code for boundary conditions is to vary the size of BUFFERSIZE and make sure that the results remain consistent.
NOTE: With ReadSourceFile(), the source file can be read by calls to retrieve header or footer information. If this occurs, the value for pContext->unTotalBytesRead is incorrect.
Implementation 3--Interrupting Structured Access Layer Calls
Implementation 3 addresses the problem of boundary conditions and interrupting calls from the structured access layer.
/**************************************************************************** * Function: fffFillBuffer() * Summary: Read fff input from stream and parse into kvtoken.h codes ****************************************************************************/ int pascal _export fffFillBuffer(
void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { double dTotalBytesProcessed, dFileSize; BOOL bRetVal = TRUE; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; pContext->cbBufOutMax = 9 * cbBufOutMax / 10; /* Process the portion of the fff file that is in the input buffer but do * not return from the fffFillBuffer() function unless the output buffer is * at least 90% full. If any of the memory allocations fail during the

IDOL KeyView (12.12)

Page 258 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader
* execution of fffProcessBuffer(), bRetVal will be set to FALSE, resulting * in this conversion failing "gracefully". */
do {
if( pContext->bBufOutFull ) {
pContext->bBufOutFull = FALSE; }
else {
fffReadSourceFile(pContext); }
bRetVal = fffProcessBuffer(pContext, pcBuf); if( pContext->bHeaderCompleted )
{ *pnPercentDone = 100; pContext->bHeaderCompleted = FALSE;
} else if( pContext->bFooterCompleted )
{ *pnPercentDone = 100; pContext->bFooterCompleted = FALSE;
} else
{ if( pContext->unTotalBytesProcessed >= pContext->unFileSize ) { *pnPercentDone = 100; } else if( pContext->unFileSize < FFF_MAX_ULONG ) { *pnPercentDone = (int)(pContext->unTotalBytesProcessed *
(UINT)100 / pContext->unFileSize); } else
{ dTotalBytesProcessed = pContext->unTotalBytesProcessed;
dFileSize = pContext->unFileSize; *pnPercentDone = (int)(dTotalBytesProcessed * 100 / dFileSize);
} } }while( bRetVal && !pContext->bBufOutFull && *pnPercentDone < 100 ); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); }

IDOL KeyView (12.12)

Page 259 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader
Structure of Implementation 3
l The most significant change in Implementation 3 is the addition of the code that checks whether the processing of the header or footer is complete. The variables for pContext>bHeaderCompleted and pContext->bFooterCompleted are set to TRUE in fffProcessBuffer () when a header or footer is processed and the end of that portion of the document is reached.
l The other piece of code added in Implementation 3 is unique to foliosr. Folio files can be 50 MB or larger. Therefore, an unsigned integer is too small to accurately calculate the percentage complete. If the file size exceeds FFF_MAX_ULONG, which is defined as (UINT)(0xFFFFFFFF / 0x64), the doubles are used for that calculation.
l Prior to returning, the token output buffer is as full as possible and never overflows. The minimum number of calls is made.
Development Tips
l Avoid unnecessary initialization.
The context variable is allocated in fpAllocateContext(). This structure must be immediately memset() to zero. This sets all BOOL values to FALSE, all pointers to NULL, and all integers to 0. Only non-zero, non-NULL and BOOLs that must be TRUE need to be initialized. This is best done in fpInitDoc().
l Know where you are in the input source file.
If you are processing headers, footers, notes, or (in the case of rtfsr) tables, you must be able to reposition the file pointer as required.
l Check buffer boundaries continuously.
Whenever you advance through the buffer, you need to know whether there is enough of the input stream to completely process the current command. If not, you need to append the next section of the input file before continuing.
l Strive for a "clean" token stream.
Use filtertest with the -d command-line option to generate a token version of the document. If there are redundant tokens, the reader is producing an inefficient token stream. You can keep the token stream free from redundancies by storing the state of the document and then applying the changes only when content is encountered. Content can be text, tabs, or picture objects. The filtertest.exe is in the directory install\samples\utf8\bin, where install is the path name of the Filter installation directory.
l Avoid large switch() statements whenever possible. They make both development and debugging more complicated than necessary. If there is a fixed set of commands, consider using a hash table that enables you to quickly identify a pointer to the function that handles that command.
l Filtering document metadata is a separate process.
Remember that fpGetSummaryInfo() is a completely separate process from the rest of your code. It creates its own context variable structure. It does not have to call fpFillBuffer().

IDOL KeyView (12.12)

Page 260 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

l Use caution when processing headers, footers, and notes.
If you need to process these items, the structured access layer calls fpOpenStream() and fpCloseStream(). It is critical that you save the state of your document and the file pointer position prior to returning from fpOpenStream(). Prior to returning from fpCloseStream(), you must restore the file pointer and the previous state of your document.
l Test your code.
The structured access layer for each SDK is unique. Test your code in Filter SDK, Export SDK, and Viewing SDK.

Functions
This section describes the functions used by custom readers to manage the source file and generate token streams required to convert a document.

xxxsrAutoDet()
This function analyzes the source document and determines whether the detected file format requires the custom reader. It is called only when the [CustomFilters] section of the formats.ini file contains an entry identifying the complete file name of the custom reader. For more information on the formats.ini file, see File Format Detection, on page 234.

Syntax

Bool pascal _export xxxsrAutoDet(

adTPDocInfo *pTPDocInfo,

KPTPIOobj

*pIO)

Arguments

pTPDocInfo pIO

A pointer to the adTPDocInfo structure provided by the structured access layer. A pointer to the I/O stream object for the document processed.

Returns
l TRUE if the file format matches that of the custom reader. l FALSE if the file format does not match that of the custom reader.
Discussion
l Typically, only the first 1 KB of the file is read into a buffer and analyzed to determine if it matches the file format of the custom reader. If a match is determined, the following four

IDOL KeyView (12.12)

Page 261 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

members of the adTPDocInfo structure must be assigned before returning TRUE:

adClass adFormat
descStr mMnmemStr

Must be set to 1. A numerical value assigned to this reader in the [Formats] section of the formats.ini file. A string describing the file format. The initial part of the custom reader file name with the "sr" excluded.

l If the return value is TRUE, the custom reader is used to parse the file and generate the token stream.
l If the return value is FALSE, all other readers in the [CustomFilters] section of the formats.ini file are tried. If no match is found, the file detection process continues checking for the formats supported by Filter SDK.
l The entry in the [Formats] section of the formats.ini file should be of the form aaa.bbb.ccc.ddd, where aaa is the value used for the adFormat parameter, bbb is the value of the file class, ccc is the value of the minor format, and ddd is the value of the major version.

xxxAllocateContext()
This function allocates a global memory block for a data context. A handle to this memory is returned to the structured access layer. The structured access layer passes this handle back to all reader entry points.

Syntax

void * pascal _export xxxAllocateContext(

void

*pSALContext,

LPARAM (pascal *fp)(void *,

UINT

LPARAM),

Bool

*pbOpenDoc,

TPVAPIServices

*pVapi,

DWORD

dwFlags)

Arguments

pSALContext fp pbOpenDoc pVapi

A pointer to the global data context structure of the structured access layer.
A pointer to a structure of callback functions supported by the structured access layer.
You must set this BOOL value to TRUE if the allocation of memory for the global data context structure is successful.
A pointer to a structure providing memory management and character conversion functions. Because this functionality is proprietary to Micro Focus,

IDOL KeyView (12.12)

Page 262 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

dwFlags

TPVAPIServices is redefined as void in kvcfsr.h. Run-time flags controlled by the structured access layer.

Returns
l Upon success, a pointer to the global data context structure for the custom reader. This pointer is passed back to all other custom reader entry points.
l Upon error, a NULL pointer. This causes the structured access layer to shut down the process.
Discussion
The global context structure should be memset() to zero in this function.

xxxFreeContext()
This function terminates an instance of the custom reader.
Syntax
int pascal _export xxxFreeContext(void *pCFContext)
Arguments
pCFContext A pointer to the global context structure for the custom reader.

Returns
l Upon success, KVERR_Success. l Upon error, a non-zero error code.
Discussion
All memory that still remains allocated within the custom reader must be freed within this function.

xxxInitDoc()
This function initializes non-zero, non-null members of pContext.

Syntax

int pascal _export xxxInitDoc(

void

*pCFContext,

IDOL KeyView (12.12)

Page 263 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

adDocDesc long KPTPIOobj

*pAutoInfo, lcbFileSize,
*pIO )

Arguments
pCFContext pAutoInfo lcbFileSize pIo

A pointer to the global context structure for the custom reader. A pointer to an adDocDesc structure defined in kwautdef. The length of the source file in bytes. A pointer to a KPTPIOobj structure defined in kvioobj.h.

Returns
l Upon success, KVERR_Success. l Upon error, a non-zero error code. This causes the structured access layer to shut down the
process.
Discussion
l For custom readers, the pAutoInfo variable can be ignored. l If the structured access layer has determined the length of the source file, that value is provided
by the lcbFileSize parameter. If it is zero, the file size must be determined in this function. l The pointer pIO provides access to file management functions defined in kvioobj.h. l In this function, all non-zero, non-NULL members of the global context structure should be
initialized.

xxxFillBuffer()
This function controls parsing of the source file and generation of tokens defined in kvtoken.h.

Syntax

int pascal _export xxxFillBuffer(

void *pCFContext,

BYTE *pcBuf,

UINT *pnBufOut,

int

*pnPercentDone,

UINT

cbBufOutMax)

IDOL KeyView (12.12)

Page 264 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

Arguments
pCFContext pcBuf pnBufOut
pnPercentDone
cbBufOutMax

A pointer to the global context structure for the custom reader.
A pointer to a memory buffer to which the tokens are written.
A pointer to a variable that specifies the actual number of bytes written to the token buffer.
A pointer to a variable that specifies the percentage completed of the file parsing.
A pointer to a variable that specifies the maximum number of bytes written to the token buffer.

Returns
l Upon success, KVERR_Success. l Upon error, a non-zero error code. This causes the structured access layer to shut down the
process.
Discussion
l Calls are made to read and parse the source file within this function. l This function is called repeatedly by the structured access layer until either the return value is
FALSE or the percentage complete is 100. l The actual number of bytes written to the token buffer must not exceed the value of
cbBufOutMax.

xxxGetSummaryInfo()
This function is required to extract document summary information.

Syntax

int pascal _export xxxGetSummaryInfo(

void

*pCFContext,

KVSummaryInfoEx

*pInfo,

BOOL

bFreeInfo)

Arguments

pCFContext pInfo bFreeInfo

A pointer to the global context structure for the custom reader.
A pointer to a KVSummaryInfoEx structure defined in kvtypes.h.
A BOOL value indicating whether to free memory allocated for summary information.

IDOL KeyView (12.12)

Page 265 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

Returns
l Upon success, KVERR_Success. l Upon error, a non-zero error code.
Discussion
This function uses an instance of the global context structure that is different from the one used by all other reader interface functions. This function can call the same functions used by xxxFillBuffer() or can be completely independent. For more information, see Extract Metadata, on page 59.

xxxOpenStream()
This function is required when initiating processing of peripheral elements such as document headers, footers, footnotes, and endnotes.

Syntax

int pascal _export xxxOpenStream(

void *pCFContext,

int

type,

int

nOrdinal)

Arguments

pCFContext type
nOrdinal

A pointer to the global context structure for the custom reader.
An integer identifying a specific header, footer, footnote, or endnote. Options are defined in kvcfsr.h.
An integer identifying a specific header, footer, footnote, or endnote. See the associated macros in kvtoken.h.

Returns
l Upon success, KVERR_Success. l Upon error, a non-zero error code.
Discussion
A call to this function results in a call to xxxFillBuffer(). The function xxxFillBuffer() provides a new empty output buffer and a new token stream input buffer to process the alternate stream for peripheral elements. In this alternate stream, paragraph and character style properties are likely different from the main body. Therefore, as the document is parsed, the existing values from the main

IDOL KeyView (12.12)

Page 266 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

body must be saved. When the processing of the alternate stream is completed and processing of the main body resumes, these values must be restored in xxxCloseStream().

xxxCloseStream()
This function is required when terminating processing for document headers, footers, footnotes, and endnotes.

Syntax

int pascal _export xxxCloseStream(

void *pCFContext,

int

type)

Arguments

pCFContext type

A pointer to the global context structure for the custom reader.
An integer identifying a specific header, footer, footnote, or endnote. Options are defined in kvcfsr.h.

Returns
l Upon success, KVERR_Success. l Upon error, a non-zero error code.
Discussion
Prior to exiting this function, the previously saved values in the global context structure must be restored. This ensures that processing of the main body resumes with the correct document state.

xxxCharSet()
This function identifies the character encoding used within the source document.
Syntax
KVCharSet pascal _export xxxCharSet( void *pCFContext, BOOL *bMSBLSB)

IDOL KeyView (12.12)

Page 267 of 280

Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader

Arguments

pCFContext bMSBLSB

A pointer to the global context structure for the custom reader.
The BOOL value required for Unicode text. Set this argument to TRUE for Big Endian and FALSE for Little Endian.

Returns
One of the enumerated values defined in the KVCharSet structure in kvcharset.h.
Discussion
If the custom reader can determine the character encoding of the document, the corresponding enumerated value is returned. If the character encoding cannot be determined, KVCS_UNKNOWN is returned.

IDOL KeyView (12.12)

Page 268 of 280

Appendix I: Password Protected Files

This section lists supported password-protected container and non-container files and describes how to open them.

· Supported Password Protected File Types

269

· Open Password Protected Container Files

270

· Filter Password Protected Files

270

Supported Password Protected File Types

The following table lists the password-protected file types that KeyView supports.

Key to support table

Symbol Description

Y

Format is supported.

N

Format is not supported.

S

Support for viewing subfiles.

V

Support for viewing content.

P

Password required.

C

Password and certificate or User ID file required.

Supported password-protected file types

File Type

Version Filter

PST (Windows)

n/a

N

PST (non-Windows)1

n/a

N

ZIP

n/a

N

7-Zip

n/a

N

Export N N N N

Extract Y Y Y Y

View S S S S

Credentials P N P P

1The native PST readers, pstxsr and pstnsr, do not require credentials to open password-protected PST files that use compressible encryption.

IDOL KeyView (12.12)

Page 269 of 280

Filter SDK Java Programming Guide Appendix I: Password Protected Files

Supported password-protected file types, continued

File Type

Version Filter Export

RAR

n/a

N

N

SMIME in MSG, EML,

n/a

MBX

N

N

Lotus Notes NSF

n/a

N

N

Adobe PDF

n/a

Y

Y

Microsoft Office

97-2003 Y

Y

2007

2010

Extract Y Y
Y Y Y

View S N
N V V

Credentials P C
C P P

Open Password Protected Container Files
This section describes how to extract password-protected container files by using the Java API. The following guidelines apply to specific file types.
l Lotus Notes NSF files. If you are running a Notes client with an active user connected to a Domino server, you must specify the user's password as a credential regardless of whether the NSF files you are opening are protected. This enables KeyView to access the Notes client and the Lotus Notes API. If the Notes client is not running with an active user, KeyView does not require credentials to access the client.
l PST files. To open password-protected PST files that use high encryption (Microsoft Outlook 2003 only), you must use the MAPI-based PST reader (pstsr). The native PST readers (pstxsr and pstnsr) do not support files that use high encryption and return the error message KVERR_PasswordProtected if a PST file is encrypted with high encryption.
To open container files l Set the credential information to an ExtOpenDocConfig object, and pass it to the extOpenDocument method. For example:
odconfig = new ExtOpenDocConfig(); odconfig.setPassword(m_password); extContextID = m_objFilter.extOpenDocument(inFile, odconfig);
Filter Password Protected Files
This section describes how to filter password-protected non-container files with the Java API.

IDOL KeyView (12.12)

Page 270 of 280

Filter SDK Java Programming Guide Appendix I: Password Protected Files
To filter password-protected files l Use the setSourcePassword(java.lang.String pwd) method. For example: objFilter.setSourcePassword(pwd); where pwd is a null-terminated string of 255 characters or fewer.

IDOL KeyView (12.12)

Page 271 of 280

Appendix J: Microsoft Rights Management Service Protected Files

This section contains information about KeyView support for Microsoft Rights Management Service (RMS).

· Microsoft Azure Rights Management Service

272

· Supported Formats

273

Microsoft Azure Rights Management Service
The Microsoft Rights Management Service (RMS) allows you to classify and optionally encrypt documents. This service forms the rights management part of Microsoft Azure Information Protection (AIP).
For many of the files that Azure RMS can classify and encrypt, KeyView can identify whether they have been encrypted with RMS encryption. It can also extract metadata (including the RMS classification) and XrML associated with the document.
For the KeyView Filter Java SDK, you can provide the credentials required to access protected files by using the Filter.configureRMS() function. This function allows the Filter and File Extraction API functions to operate on the protected data of the file.
When you use Azure RMS decryption, consider the following notes:
l Azure RMS decryption is licensed as an additional product. If your license does not allow for Azure RMS decryption, this function throws a FilterException that returns KVError_ ReaderUsageDenied from its getErrorCode() method.
l To access the protected content, KeyView must make an HTTP request. The time required to do so means that KeyView processes protected files slower than unprotected files.
l By default, KeyView uses the system proxy when it makes HTTP requests to obtain the key. You can also specify the proxy manually in the configuration file. See Configure the Proxy for RMS, on page 87.
l This function is supported only on certain platforms, see RMS Decryption in the platform differences section.
CAUTION: When Filter or File Extraction API functions access the protected contents of Azure RMS-protected files, KeyView may place decrypted contents into the temporary directory. If you want to manage the security of such files, you might want to change the temporary directory, by using Filter.setConfigOption() with the Filter.CFG_SETTEMPDIRECTORY constant.

IDOL KeyView (12.12)

Page 272 of 280

Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files
RMS Credentials
For KeyView to access the protected contents of Microsoft Azure Rights Management System (RMS) protected files, your end-user application must be registered on the relevant Azure domain. For more information about how to register an app, refer to the Microsoft documentation: https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-registerapp. After you register an application, you can find the client and tenant IDs in the Azure Portal, in the Overview section. You can find the client secret in the Certificates & Secrets section.
CAUTION: This information is linked to the domain itself, rather than to a specific user. Providing this information allows KeyView to access the contents of all files protected by this domain. Therefore you must handle these three pieces of information securely.
Supported Formats
KeyView support for Azure RMS files depends on the encryption method that Azure RMS uses for each file type, and on whether the file is classified or protected. In Azure RMS, classified files have additional labels to inform users of their sensitivity, while protected files are encrypted so that only authorized users can view them. In some cases, KeyView format detection returns a different file type depending on whether the file is classified or protected. The following sections provide information about the Azure RMS support for different file types, and metadata support.
Microsoft Office Files
The following table describes KeyView detected formats for Microsoft Office files that Azure RMS encrypts by creating an OLE container. For these files:
l KeyView can get classification metadata. l KeyView can detect whether the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag). l When you configure credentials through Filter.configureRMS(), Filter and File Extraction
API functions can operate on the protected data of the file. In this case, you can filter, extract, and get summary information.
In most cases, KeyView can also extract the XrML file for these files when they are protected, and identify the XrML files as KVSubFileType_XrML.

IDOL KeyView (12.12)

Page 273 of 280

Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files

File extensions Format detected when file is classified but not protected

Format detected when XrML

file is protected

extraction

docx, dotx

MS_Word_2007_Fmt

MS_Office_2007_Fmt Yes

docm, dotm

MS_Word_Macro_2007_Fmt

MS_Office_2007_Fmt Yes

pptx, potx, ppsx MS_PPT_2007_Fmt

MS_Office_2007_Fmt Yes

pptm, potm, ppsm

MS_PPT_Macro_2007_Fmt

MS_Office_2007_Fmt Yes

vsdx

MS_Visio_2013_Fmt

MS_Office_2007_Fmt Yes

vsdm, vssm,

MS_Visio_2013_Macro_Fmt

MS_Office_2007_Fmt Yes

vssx, vstm, vstx MS_Visio_2013_Stencil_Fmt

MS_Visio_2013_Stencil_Macro_

Fmt

MS_Visio_2013_Template_Fmt

MS_Visio_2013_Template_Macro_

Fmt

xlsx, xltx

MS_Excel_2007_Fmt

MS_Office_2007_Fmt Yes

xlsm, xlsb, xltm MS_Excel_Macro_2007_Fmt MS_Excel_Binary_2007_Fmt

MS_Office_2007_Fmt Yes

xps

MS_XPS_Fmt

MS_Office_2007_Fmt Yes

doc, dot

MS_Word_95_Fmt MS_Word_97_Fmt MS_Word_2000_Fmt

MS_Word_95_Fmt

Yes

MS_Word_97_Fmt

MS_Word_2000_Fmt

ppt, pot, pps

PowerPoint_95_Fmt PowerPoint_97_Fmt

PowerPoint_95_Fmt

Yes

PowerPoint_97_Fmt

xls, xla, xlam, xlt

Excel_Fmt Excel_Macro_Fmt Excel_95_Fmt Excel_97_Fmt Excel_2000_Fmt

Excel_Fmt

Yes

Excel_Macro_Fmt

Excel_95_Fmt

Excel_97_Fmt

Excel_2000_Fmt

Implemented as pFile
The following table describes the KeyView detected formats for files that Azure RMS encrypts by creating a pFile around the document. For these files:
l KeyView can get classification metadata. l KeyView can detect whether the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag).

IDOL KeyView (12.12)

Page 274 of 280

Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files

l KeyView can extract the XrML if the file is protected.
l When you configure credentials through Filter.configureRMS(), Filter and File Extraction API functions can operate on the protected data of the file. In this case, you can filter, extract, and get summary information.

File extensions pfile vsd vdw, vss, vst mpp, mpt
pub jpg
png gif

Format detected when file is classified but not protected
n/a
MS_Visio_Fmt
MS_Visio_Fmt
MS_Project_4_Fmt MS_Project_41_Fmt MS_Project_98_Fmt MS_Project_2000_Fmt MS_Project_2007_Fmt MS_Publisher_98_Fmt
JPEG_File_Interchange_Fmt
PNG_Fmt
GIF_89a_Fmt

Format detected when file is protected RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt
RMS_ Protected_ Fmt RMS_ Protected_ Fmt
RMS_ Protected_ Fmt RMS_ Protected_ Fmt

Notes
Protected format has extension pjpg. When classified but not protected, the classification metadata is XMP. Protected format has extension ppng. Protected format has extension pgif.

IDOL KeyView (12.12)

Page 275 of 280

Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files

File extensions
tif
dng dwfx psd, psb

Format detected when file is classified but not protected

Format detected when file is protected

Notes

When classified but not protected, the classification metadata is XMP.

TIFF_Fmt

RMS_ Protected_ Fmt

Protected format has extension ptif.
When classified but not protected, the classification metadata is XMP.

TIFF_Fmt

RMS_ Protected_ Fmt

When classified but not protected, the classification metadata is XMP.

MS_XPS_Fmt

RMS_ Protected_ Fmt

When classified but not protected, dwfx is detected and treated as XPS.

PSD_Fmt

RMS_ Protected_ Fmt

When classified but not protected, the classification metadata is XMP.

PDF Files
The following table describes the KeyView detected formats for PDF documents, which Azure RMS encrypts by creating an encrypted PDF (in which each stream and metadata value is encrypted), wrapped in a container PDF. KeyView allows you to extract the encrypted PDF from the container, and then for the extracted file:

IDOL KeyView (12.12)

Page 276 of 280

Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files

l KeyView can detect whether the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag).
l KeyView can extract the XrML if the file is protected.
l When you configure credentials through Filter.configureRMS(), Filter and File Extraction API functions can operate on the protected data of the file. In this case you can filter, extract, and get summary information for PDF formats.

File extensions
pdf

Format detected when file is classified but Format detected when file is

not protected

protected

PDF_Fmt PDF_Portfolio_Fmt

PDF_Fmt PDF_Portfolio_Fmt

Restricted Permission Messages
Azure RMS encrypts email messages by creating an encrypted rpmsg attachment, which contains the original message body and attachments, attached to an unencrypted container message, which contains the message metadata. KeyView can extract the metadata and the encrypted rpmsg from the container message, and then for the extracted rpmsg:
l KeyView can detect that the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag).
l When you configure credentials through Filter.configureRMS(), File Extraction API functions can operate on the protected data of the file. This allows you to extract the message body and attached files, but attached messages are not currently supported.
NOTE: Extraction of the XrML from the encrypted rpmsg is not supported.

IDOL KeyView (12.12)

Page 277 of 280

Filter SDK Java Programming Guide Appendix K: OCR Supported Languages

Appendix K: OCR Supported Languages
KeyView OCR supports the following languages. In parentheses following each language name is the corresponding ISO 639-1 language code.

Latin Alphabet

Afrikaans (af) Basque (eu) Catalan (ca) Croatian (hr) Czech (cs) Danish (da) Dutch (nl) English (en)

Esperanto (eo) Estonian (et) Finnish (fi) French (fr) German (de) Hungarian (hu) Icelandic (is) Italian (it)

Irish (ga) Latin (la) Latvian (lv) Lithuanian (lt) Maltese (mt) Norwegian (no) Polish (pl) Portuguese (pt)

Romanian (ro) Slovak (sk) Slovenian (sl) Spanish (es) Swedish (sv) Turkish (tr) Welsh (cy)

Arabic Alphabet
Arabic (ar) Persian (fa)

Urdu (ur)

Chinese Alphabet
Simplified Chinese (zhs)

Traditional Chinese (zht)

IDOL KeyView (12.12)

Page 278 of 280

Filter SDK Java Programming Guide Appendix K: OCR Supported Languages
Cyrillic Alphabet
Bulgarian (bg) Macedonian (mk) Russian (ru)
Other Alphabets
Greek (el) Hebrew (he) Japanese (ja) Korean (ko) Thai (th)

Serbian (sr) Ukrainian (uk)

IDOL KeyView (12.12)

Page 279 of 280

Send documentation feedback
If you have comments about this document, you can contact the documentation team by email. If an email client is configured on this system, click the link above and an email window opens with the following information in the subject line: Feedback on Micro Focus IDOL KeyView 12.12 Filter SDK Java Programming Guide Add your feedback to the email and click Send. If no email client is available, copy the information above to a new message in a web mail client, and send your feedback to swpdl.idoldocsfeedback@microfocus.com. We appreciate your feedback!

IDOL KeyView (12.12)

Page 280 of 280


madbuild