IDOL KeyView Filter SDK 12.13 Java Programming Guide
IDOL KeyView Software Version 12.13 Filter SDK Java Programming Guide Document Release Date: October 2022 Software Release Date: October 2022 Filter SDK Java Programming Guide Legal notices © Copyright 2016-2022 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors ("Micro Focus") are as may be set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l View information about all services that Support offers l Submit and track service requests l Contact customer support l Search for knowledge documents of interest l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. IDOL KeyView (12.13) Page 2 of 284 Filter SDK Java Programming Guide Contents Part I: Overview of Filter SDK Chapter 1: Introducing Filter SDK Overview Features Platforms, Compilers, and Dependencies Supported Platforms Supported Compilers Software Dependencies Windows Installation UNIX Installation Package Contents License Information Enable Advanced Document Readers Pass License Information to KeyView Directory Structure Chapter 2: Getting Started Architectural Overview File Caching Filtering Subfile Extraction Use the Java Implementation of the API Input/Output Operations Filter in File or Stream Mode Multithreaded Filtering Before Running Your Application The Filter Process Model Filter API File Extraction API Persist the Child Process In the API In the formats.ini File Run Filter In Process In the API IDOL KeyView (12.13) 11 12 12 12 13 13 14 14 15 16 16 17 17 18 19 20 20 21 22 22 23 23 23 24 25 25 25 26 26 26 26 27 27 Page 3 of 284 Filter SDK Java Programming Guide In the formats.ini File Run File Extraction Functions Out of Process Restart the File Extraction Server Out-of-Process Logging Enable Out-of-Process Logging Set the Verbosity Level Enable Windows Minidump Keep Log Files Run File Detection In or Out of Process Specify the Process Type In the formats.ini File Specify the Process Type In the API Stream Data to Filter Part II: Use Filter SDK Chapter 3: Use the File Extraction API Introduction Extract Subfiles Sanitize Absolute Paths Extract Images Recreate a File Hierarchy Create a Root Node Example Extract Mail Metadata Default Metadata Set Extract the Default Metadata Set Extract All Metadata Microsoft Outlook (MSG) Metadata Extract MSG-Specific Metadata Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata Extract EML- or MBX-Specific Metadata Lotus Notes Database (NSF) Metadata Extract NSF-Specific Metadata Microsoft Personal Folders File (PST) Metadata MAPI Properties Extract PST-Specific Metadata Exclude Metadata from the Extracted Text File Extract Subfiles from Outlook Files Extract Subfiles from Outlook Express Files IDOL KeyView (12.13) 27 27 28 28 28 28 29 29 30 30 30 30 32 33 33 34 35 36 36 37 37 38 38 39 39 40 41 41 41 42 42 42 42 43 44 44 44 Page 4 of 284 Filter SDK Java Programming Guide Extract Subfiles from Mailbox Files Extract Subfiles from Outlook Personal Folders Files Choose the Reader to use for PST Files MAPI Attachment Methods Open Secured PST Files Detect PST Files While the Outlook Client is Running Extract Subfiles from Lotus Domino XML Language Files Extract .DXL Files to HTML Extract Subfiles from Lotus Notes Database Files System Requirements Installation and Configuration Windows Linux AIX 5.x Open Secured NSF Files Format Note Subfiles Extract Subfiles from PDF Files Improve Performance for PDFs with Many Small Images Extract Embedded OLE Objects Extract Subfiles from ZIP Files Default File Names for Extracted Subfiles Default File Name for Mail Formats Default File Name for Embedded OLE Objects Chapter 4: Use the Filter API Generate an Error Log Enable or Disable Error Logging Change the Path and File Name of the Log File Report Memory Errors Specify a Memory Guard Report the File Name in Stream Mode Example Specify the Maximum Size of the Log File Extract Metadata Extract Metadata for File Filtering Extract Metadata for Stream Filtering Example Convert Character Sets Determine the Character Set of the Output Text Guidelines for Character Set Conversion IDOL KeyView (12.13) 44 45 45 47 48 48 48 49 49 50 50 50 50 51 51 51 51 52 52 52 53 53 54 55 55 56 57 57 57 58 58 58 59 59 59 60 62 62 62 Page 5 of 284 Filter SDK Java Programming Guide Set the Character Set During Filtering 63 Set the Character Set During Subfile Extraction 63 Prevent the Default Conversion of a Character Set 64 Extract Tracked Deleted Text 64 Filter PDF Files 64 Use the pdf2sr Reader 65 Filter PDF Files to a Logical Reading Order 65 Rotated Text 68 Extract Custom Metadata from PDF Files 68 Skip Embedded Fonts 69 Control Hyphenation 70 Filter Portfolio PDF Files 71 Table Detection for PDF Files 71 Filter Spreadsheet Files 71 Filter Worksheet Names 72 Filter Hidden Text in Microsoft Excel Files 72 Specify Date and Time Format on UNIX Systems 72 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 73 Extract Microsoft Excel Formulas 73 Standardize Cell Formats 74 Tab Delimited Empty Cells 75 Filter Presentation Files to a Logical Reading Order 75 Filter HTML Files 76 Filter XML Files 76 Configure Element Extraction for XML Documents 76 Configure Headers and Footers 81 Error Messages 81 Tab Delimited Output for Spreadsheets and Embedded Tables 84 Table Output for IDOL Eduction 85 Exclude Japanese Guide Text 85 Source Code Identification 85 Optical Character Recognition 86 Optimize OCR Performance 87 Configure the Proxy for RMS 88 Document Restrictions 89 Chapter 5: Sample Programs 91 Introduction 91 ExtractFilter 92 FilterFileByChunk 94 IDOL KeyView (12.13) Page 6 of 284 Filter SDK Java Programming Guide FilterFileToFile FilterFileToStream FilterStreamByChunk FilterStreamToFile FilterStreamToStream FilterTest Part III: Appendixes Appendix A: Supported Formats Key to Supported Formats Table Supported Formats File Classes Appendix B: Document Readers Key to Document Readers Table Document Readers Appendix C: Platform Differences Feature Differences Reader Differences Appendix D: Character Sets Multibyte and Bidirectional Support Coded Character Sets Appendix E: Extract and Format Lotus Notes Subfiles Overview Customize XML Templates Use Demo Templates Use Old Templates Disable XML Templates Template Elements and Attributes Conditional Elements Control Elements Data Elements Date and Time Formats Lotus Notes Date and Time Formats KeyView Date and Time Formats IDOL KeyView (12.13) 95 96 97 98 99 100 104 105 105 107 175 177 177 179 207 208 209 211 211 219 225 225 225 226 226 226 227 227 228 229 232 232 233 Page 7 of 284 Filter SDK Java Programming Guide Appendix F: File Format Detection Introduction Extract Format Information Determine Format Support Example formats.ini file entries Refine Detection of Text Files Allow Consecutive NULL Bytes in a Text File Translate Format Information Distinguish Between Formats Determine a Document Reader Additional Format Information Appendix G: List of Required Files for Redistribution Core Files Support Files Document Readers Appendix H: Develop a Custom Reader Introduction How to Write a Custom Reader Naming Conventions Basic Steps Token Buffer Macros Reader Interface Function Flow Example Development of fffFillBuffer() Implementation 1--fpFillBuffer() Function Structure of Implementation 1 Problems with Implementation 1 Implementation 2--Processing a Large Token Stream Structure of Implementation 2 Problems with Implementation 2 Boundary Conditions Implementation 3--Interrupting Structured Access Layer Calls Structure of Implementation 3 Development Tips Functions xxxsrAutoDet() xxxAllocateContext() IDOL KeyView (12.13) 238 238 238 238 239 239 240 241 242 242 242 243 243 244 245 253 253 254 254 255 255 257 257 258 258 258 259 259 260 260 261 261 262 264 264 265 265 266 Page 8 of 284 Filter SDK Java Programming Guide xxxFreeContext() xxxInitDoc() xxxFillBuffer() xxxGetSummaryInfo() xxxOpenStream() xxxCloseStream() xxxCharSet() Appendix I: Password Protected Files Supported Password Protected File Types Open Password Protected Container Files Filter Password Protected Files Appendix J: Microsoft Rights Management Service Protected Files Microsoft Azure Rights Management Service RMS Credentials Supported Formats Microsoft Office Files Implemented as pFile PDF Files Restricted Permission Messages Appendix K: OCR Supported Languages Send documentation feedback 267 267 268 269 270 271 271 273 273 274 274 276 276 277 277 277 278 280 281 282 284 IDOL KeyView (12.13) Page 9 of 284 Filter SDK Java Programming Guide IDOL KeyView (12.13) Page 10 of 284 Part I: Overview of Filter SDK This section provides an overview of the Micro Focus KeyView Filter SDK and describes how to use the Java implementation of the API. l Introducing Filter SDK l Getting Started IDOL KeyView (12.13) Page 11 of 284 Chapter 1: Introducing Filter SDK This section describes the Filter SDK package. · Overview 12 · Features 12 · Platforms, Compilers, and Dependencies 13 · Windows Installation 15 · UNIX Installation 16 · Package Contents 16 · License Information 17 · Directory Structure 19 Overview Micro Focus KeyView Filter SDK enables you to incorporate text extraction functionality into your own applications. It extracts text and metadata from a wide variety of file formats on numerous platforms, and can automatically recognize over 1000 document types. It supports both file-based and stream-based I/O operations, and provides in-process or out-of-process filtering. Filter SDK is part of the KeyView suite of products. KeyView provides high-speed text extraction, conversion to web-ready HTML and well-formed XML, and high-fidelity document viewing. Features l Document readers are threadsafe. The benefit of a threadsafe technology is that you can successfully extract text from hundreds of documents simultaneously. Documents are not queued for sequential filtering, but are actually filtered at the same time. l Filter supports popular word processing, spreadsheet, and presentation formats. Body text, endnotes, footnotes, and additional items such as document metadata are all included as part of the filtering process. l Sample programs are provided to demonstrate the functionality of the APIs. l You can extract files embedded within files, such as email attachments or embedded OLE objects, by using the File Extraction API. l Filter allows for redirected input and output. You can provide an input stream that is not restricted to file system access. IDOL KeyView (12.13) Page 12 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK l Filter automatically recognizes the file type being filtered and uses the appropriate filter. Your application does not need to rely on file name extensions to determine file types. l You can filter documents to specific character encodings, such as Unicode or UTF-8. l You can write custom document readers for formats not directly supported by KeyView. Platforms, Compilers, and Dependencies This section lists the supported platforms, supported compilers, and software dependencies for the KeyView software. Supported Platforms The Java Filter SDK is supported on the following platforms. l CentOS 7 x86, x64, and AArch64 l IBM AIX L6.1 PowerPC 32-bit and 64-bit l IBM AIX L7.1 PowerPC 32-bit and 64-bit l macOS 10.13 or later on 64-bit Apple-Intel architecture l macOS 11 or later on Apple M1. l Microsoft Windows Server 2012 x64 l Microsoft Windows Server 2016 x64 l Microsoft Windows Server 2019 x64 l Microsoft Windows Server 2022 x64 l Microsoft Windows 8 x86 and x64 l Microsoft Windows 10 x86 and x64 l Microsoft Windows 11 x64 l Oracle Solaris 10 SPARC l Oracle Solaris 10 x86 and x64 l Red Hat Enterprise Linux 7 x64 l Red Hat Enterprise Linux 8 x64 l SuSE Linux Enterprise Server 12 x64 l SuSE Linux Enterprise Server 15 x64 IDOL KeyView (12.13) Page 13 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK Supported Compilers The following table lists the supported compilers for the Java Filter SDK. Component Java components Compiler Java 7 to 17 Software Dependencies To run KeyView on Windows requires the Microsoft Visual C++ 2019 redistributables to be installed. The redistributables are provided in the vcredist folder of the KeyView SDK but you can download the latest installers from Microsoft to get the latest security, reliability, and performance improvements. To run KeyView OCR and RMS decryption on 64-bit Linux requires libstdc++.so.6 and libgcc_ s.so.1 from GCC 5.4. For your convenience, these are provided in the redist folder of your KeyView installation. NOTE: If you are running KeyView out-of-process then the kvoop executable must be able to link to libstdc++.so.6 and libgcc_s.so.1. l If these are installed in a system folder, like /lib64, KeyView will find them automatically. l If you prefer you can add the path of the folder containing these libraries to the environment variable LD_LIBRARY_PATH. If you are running KeyView in-process: l If your application is already linking to libgcc_s and libstdc++ from GCC 5.4 or later, KeyView will use them as well and no further action is needed. l If your application is linking to earlier versions of libgcc_s and libstdc++, Micro Focus recommends that you upgrade those binaries to those from GCC 5.4 or later. l If your application is not linking to libgcc_s and libstdc++ you must ensure those binaries are available in the same way as described in the instructions, above, for running KeyView out-of-process. If older versions of libgcc_s and libstdc++ are provided (but at least those from GCC 4.8) then most features will continue to work, but Optical Character Recognition and RMS Decryption will not. Some KeyView components require specific third-party software: l Java Runtime Environment (JRE) or Java Development Kit (JDK) version 7 to 17 is required for the Filter and Export Java APIs and for graphics conversion in the Export SDK. l Outlook 2002 or later is required to process Microsoft Outlook Personal Folders (PST) files using the MAPI-based reader (pstsr). The native PST readers (pstxsr and pstnsr) do not require Outlook. IDOL KeyView (12.13) Page 14 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK NOTE: You must install an edition of Microsoft Outlook (32-bit or 64-bit) that matches the KeyView software. For example, if you use 32-bit KeyView, install 32-bit Outlook. If you use 64-bit KeyView, install 64-bit Outlook. If the editions do not match, KeyView returns Error 32: KVError_PSTAccessFailed and an error message from Microsoft Office Outlook is displayed: Either there is a no default mail client or the current mail client cannot fulfill the messaging request. Please run Microsoft Outlook and set it as the default mail client. l Lotus Notes or Lotus Domino is required for Lotus Notes database (NSF) file processing. The minimum requirement is 6.5.1, but version 8.5 is recommended. l The Microsoft .NET Framework is required if you are using the .NET implementation of the API. Windows Installation To install the SDK on Windows, use the following procedure. To install the SDK 1. Run the installation program, KeyViewProductNameSDK_VersionNumber_OS.exe, where ProductName is the name of the product, VersionNumber is the product version number, and OS is the operating system. For example: KeyViewFilterSDK_12.13_Windows_X86_64.exe The installation wizard opens. 2. Read the instructions and click Next. The License Agreement page opens. 3. Read the agreement. If you agree to the terms, click I accept the agreement, and then click Next. The Installation Directory page opens. 4. Select the directory in which to install the SDK. To specify a directory other than the default, click , and then specify another directory. After choosing where to install the SDK, click Next. The Pre-Installation Summary opens. 5. Review the settings, and then click Next. The SDK is installed. 6. Click Finish. IDOL KeyView (12.13) Page 15 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK UNIX Installation To install the SDK, use one of the following procedures. To install the SDK from the graphical interface l Run the installation program and follow the on-screen instructions. To install the SDK from the console 1. Run the installation program from the console as follows: ./KeyViewFilterSDK_VersionNumber_Platform.exe --mode text where: VersionNumber Platform is the product version. is the name of the platform. 2. Read the welcome message and instructions and press Enter. The first page of the license agreement is displayed. 3. Read the license information, pressing Enter to continue through the text. After you finish reading the text, and if you accept the agreement, type Y and press Enter. You are asked to choose an installation folder. 4. Type an absolute path or press Enter to accept the default location. The Pre-Installation summary is displayed. 5. If you are satisfied with the information displayed in the summary, press Enter. The SDK is installed. Package Contents The Filter SDK installation contains: l All the libraries and executables necessary for extracting text from a wide variety of formats. l The include files that define the functions and structures used by the application to establish an interface with Filter: adapi.h adinfo.h kvfilter.h kvioobj.h IDOL KeyView (12.13) Page 16 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK kvcfsr.h kvcharset.h kverrorcodes.h kvfilt.h kvfilt2.h kvtoken.h kvtypes.h kvxtract.h kwautdef.h l The Java API implemented in the package com.verity.api.filter contained in the file KeyView.jar. l The .NET API implemented in the namespace Autonomy.API.Filter in the library FilterDotNet.dll. l The C++ SDK, which can be found in the cppapi folder. l Sample programs that demonstrate File Extraction and Filter functionality using the APIs. l The files necessary to create a custom document reader, and the source for a sample document reader for UTF-8. See Develop a Custom Reader, on page 253. License Information Your license key controls whether you have the full version of the KeyView SDK, or a trial version. It also determines whether the following advanced features are enabled: l Advanced character set detection with the character set detection library (kvlangdetect). l Advanced document readers: o Microsoft Outlook Personal Folders (PST) readers (pstsr, pstnsr, and pstxsr) o Lotus Notes database (NSF) reader (nsfsr) o Mailbox (MBX) reader (mbxsr) l Processing of documents protected by Microsoft RMS encryption. l Optical Character Recognition (OCR) to attempt to filter text that might be visible in raster image files. If you obtain a new license key from Micro Focus, you must update the licensing information that you pass to KeyView. See Pass License Information to KeyView. Enable Advanced Document Readers To enable advanced readers, you must obtain an appropriate license key from Micro Focus and pass the license key to KeyView as described in Pass License Information to KeyView. If you are enabling the MBX reader in an existing installation of Filter, in addition to updating the license key, change the parameter 208=eml to 208=mbx in the formats.ini file. IDOL KeyView (12.13) Page 17 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK Pass License Information to KeyView To provide license information to KeyView, do one of the following: l Provide the license information through the API. Micro Focus recommends using this approach. l Provide the license information as a text file named kv.lic. In earlier versions of KeyView, license information had to be stored in a file and included in the bin folder with the KeyView libraries. The ability to provide license information as a file has been deprecated and might be removed in future. You should no longer include license information in your application as a file. Micro Focus recommends that you pass license information to KeyView through the API instead. If you have an evaluation version of KeyView and purchase a full version of the SDK, or you are adding a document reader (for example, the PST reader), you must update the license information that you pass to KeyView. To provide license information through the API l In the C API, provide license information when you initialize KeyView by calling fpInitWithLicenseData(). l In the C++ API, provide license information when you start a new session (see the constructor for the Session class). l In the .NET API, provide license information to KeyView when you instantiate the Filter object. l In the Java API, provide license information to KeyView when you instantiate the Filter object. To provide license information as a file 1. Open or create the license key file, kv.lic, in a text editor. The file must be saved in the same directory as the KeyView libraries, and must contain your organization name and license key. COMPANY NAME XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX 2. Replace the text COMPANY NAME with the company name that appears at the top of the License Key Sheet provided by Micro Focus. Enter the text exactly as it appears in the document. 3. Replace the characters XXXXXX-XXXXXXX-XXXXXXX-XXXXXXX with the appropriate license key from the License Key Sheet provided by Micro Focus. The license key is listed in the Key column in the Standalone Products table. The key is a string that contains 31 characters, for example, 2TQD22D-2M6FV66-2KPF23S-2GEM5AB. Enter the characters exactly as they appear in the document, including the dashes, but do not include a leading or trailing space. 4. The finished kv.lic file looks similar to the following: Autonomy 24QD22D-2M6FV66-2KPF23S-2G8M59B 5. Save the file. IDOL KeyView (12.13) Page 18 of 284 Filter SDK Java Programming Guide Chapter 1: Introducing Filter SDK Directory Structure The following table describes the contents of the Filter SDK. The variable OS is the operating system for which the SDK is installed. For example, the bin directory on a standard 32-bit Windows installation would be located at KeyviewFilterSDK\WINDOWS\bin. Installed directory structure Directory Description OS\bin Contains the libraries, the format detection file formats.ini, and other supporting files, as well as the C programs filter and filtertest, which you can use to test your custom document readers (see Develop a Custom Reader, on page 253). OS\lib (Solaris installations only) Contains the redistributable libstlport.so.1 library, which is required to run KeyView on Solaris platforms. dotnetapi Contains the source files for the .NET API. dotnetapi\dotnethelp Contains the help for the .NET API. dotnetapi\sample Contains the sample programs for the .NET API. cppapi Contains the source files for the C++ API. cppapi\sample Contains the sample programs for the C++ API. guide Contains the KeyView Filter SDK programming guides in PDF and HTML format. include Contains the header files required for Filter. javaapi\javadoc Contains the Javadoc for the Java API. javaapi\sample Contains the source files and sample programs for the Java API. rel_notes Contains the KeyView Filter SDK Release Notes in PDF format. samples\filter Contains the source code for the filter sample program demonstrating the Filter interface for the C API. samples\pdfini Contains the initialization file used to extract custom metadata from PDF documents. samples\tstxtract Contains a C sample program demonstrating the File Extraction interface. samples\utf8sr Contains the source for the sample document reader for UTF-8 files. You can use this to create your own custom document readers. IDOL KeyView (12.13) Page 19 of 284 Chapter 2: Getting Started This section provides an overview of Filter SDK, and describes how to use the Java implementation of the API. · Architectural Overview 20 · File Caching 21 · Filtering 22 · Subfile Extraction 22 · Use the Java Implementation of the API 23 · The Filter Process Model 25 · Run File Detection In or Out of Process 30 · Stream Data to Filter 30 Architectural Overview The general architecture of the KeyView Filter technology is the same across all supported platforms and is illustrated in the following diagram. Each component is described in the following table. IDOL KeyView (12.13) Page 20 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started Architectural Components Component Description Developer's Application The developer's application interfaces directly with the Filter API through either a C-language, Java or .NET implementation. File Extraction The File Extraction API opens a file and extracts the file's subfiles so they are API exposed for filtering. See Use the File Extraction API, on page 33. Filter API The Filter API exposes the filtering functionality and controls all other modules during the filtering process. See Use the Filter API, on page 55. Format Detection This module determines the file type of the input stream, allowing the Filter API to return that information to the developer's application, or to load the appropriate structured access layer for further processing. See File Format Detection, on page 238 for more information format detection. Structured Access Layer There are three modules that reside in the structured access layer--one each for word processing, spreadsheet, and presentation formats. The file detection result determines which structured access layer module is used during the filtering process. That module loads the appropriate document reader and proceeds with text extraction or metadata retrieval. Document Readers Each document reader reads a specific file format and sends a text stream of the document to the structured access layer. Each filter is loaded as required by the structured access layer. See Document Readers, on page 245 for a complete list of document readers. File Caching To reduce the frequency of I/O operations, and consequently improve performance, the KeyView readers load file data into memory. The readers then read the data from the cache rather than the physical disk. You can configure the amount of memory used for file caching through the formats.ini file. Generally, when you increase the memory, performance will improve. By default, KeyView uses a maximum of 1 MB of memory for each thread. If the file data is larger than 1MB, up to 1MB of data is cached and the data beyond 1 MB is read from disk. The minimum amount of memory that can be used for file caching is 64 KB. To determine a reasonable value, divide the maximum amount of memory you want KeyView to use for file caching by the total number of threads. For example, if you want KeyView to use a maximum of 50MB of memory and have 10 threads, set the value to 5 MB. To modify the memory allocated for file caching, change the value for the following parameter in the [DiskCache] section of the formats.ini file: DiskCacheSize=1024 IDOL KeyView (12.13) Page 21 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started The value is in kilobytes. If this parameter is not set or is set to 0 (zero), the minimum value of 64 KB is used. The formats.ini file is in the directory install\OS\bin, where install is the pathname of the Filter installation directory and OS is the name of the operating system. Filtering Filter SDK enables you to filter many different types of documents. Filtering is the process of extracting the text from a document without the application-specific markup. However, the filtering process can also include the following: l Subfile extraction--exposes all subfiles for filtering. See Use the File Extraction API, on page 33. l File format extraction--detects a file's format, and reports the information to the API, which in turn reports the information to the developer's application. See File Format Detection, on page 238. l Metadata extraction--extracts selected metadata (document properties) from a file. See Extract Metadata, on page 59. l Character set conversion--controls the character set of both the input and the output text. See Convert Character Sets, on page 62. Subfile Extraction To filter a file, you must first determine whether the file contains any subfiles (attachments, embedded OLE objects, and so on). A file that contains subfiles is called a container file. Archive files (such as ZIP), mail messages with attachments (such as Microsoft Outlook Express), mail stores (such as Microsoft Outlook Personal Folders), and compound documents with embedded OLE objects (such as a Microsoft Word document with an embedded Excel chart) are examples of container files. If the file is a container file, the container must be opened and its subfiles extracted using the File Extraction interface. The extraction process is done repeatedly until all subfiles are extracted and exposed for filtering. Once a subfile is extracted, you can use the Filter API to filter the file. If a file is not a container, you should pass it directly to the Filter API for filtering without extraction. The ExtractFilter sample program demonstrates this logic for extracting and filtering files. See Use the File Extraction API, on page 33 for more information. IDOL KeyView (12.13) Page 22 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started Use the Java Implementation of the API The Java version of the Filter API provides an interface to the core functionality of the C API. It contains one primary class (Filter) that wraps the filter functionality of the C API. It is implemented in the package com.verity.api.filter contained in the file KeyView.jar. The JAR file is in the directory install\javaapi, where install is the path name of the Filter installation directory. For more information on the Java API, see the Javadoc in the directory install\javaapi\javadoc, and Sample Programs, on page 91. Input/Output Operations Methods in the Filter Java API have signatures that support a variety of input and output methods. The input source can usually be a physical file accessed through a file path, a com.verity.api.SeekableInputStream or a standard java.io.InputStream. You can send the output to a file or java.io.OutputStream, or return it one chunk at a time in a byte array. You can set the input source by calling the setInputSource method. Alternatively, you can supply it as a parameter when you use the doFilter, canFilter, canFilterEx, getDocFormatInfo, or getSummaryInfo methods. KeyView needs to access different parts of files while it is filtering. When the input source is a stream, Micro Focus recommends passing a SeekableInputStream into KeyView, because it allows KeyView to only read the parts of the stream it needs to read. If you use a Java InputStream, KeyView must store the stream as it is received, writing to a temporary file if the stream is large. If you use a Java InputStream as the source, there are two available method signatures for functions. One method signature allows you to pass in the stream size. If you do not supply the stream size, KeyView reads the entire stream before processing starts. If you can provide the stream size, KeyView might not need to read the whole stream. Filter in File or Stream Mode To filter files using the methods in the Filter class 1. Instantiate a Filter object using either the default constructor or the constructor that sets the output character set and filter flags: a. Use the default constructor Filter(). For example: m_objFilter = new Filter(); b. Use the constructor Filter(java.lang.String outputCharSet, long filterFlags). For example: m_objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_OOPLOGON); The Filter flags provide instructions on how to process a file or stream. For example, they specify whether an error log is generated during filtering (FILTERFLAG_OOPLOGON) or whether headers and footers are extracted from the document (FILTERFLAG_HEADERFOOTERTAGS). IDOL KeyView (12.13) Page 23 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started NOTE: Filter runs out of process by default. See The Filter Process Model, on the next page for more information. 2. Set the location of the Filter libraries by calling the setFilterDirectory(java.lang.String directory) method. These libraries are normally stored in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. For example: m_objFilter.setFilterDirectory(m_filterDirectory); 3. Set the input source as either a file or input stream by calling the setInputSource method. m_objFilter.setInputSource(m_extractDir + filename); 4. Filter the file or stream by calling either the filterTo or doFilterChunk method. The filterTo method extracts the data to a file or a stream. The doFilterChunk method extracts one chunk of data from a file or a stream. It must be called repeatedly until the entire buffer is filtered. If filtering in file mode, use the following code: { m_objFilter.filterTo(m_extractDir + filename + m_extension); } If filtering in stream mode, use the following code: { } } outf = new File(m_extractDir + filename + m_extension); fos = new FileOutputStream(outf); m_objFilter.filterTo(fos); fos.close(); 5. Terminate the filtering session and free allocated system resources by calling the shutdownFilter() method. m_objFilter.shutdownFilter(); Multithreaded Filtering To ensure multithreaded filter processes are thread-safe, you must create a unique Filter context for every thread by instantiating a Filter object. In addition, threads must not share context objects, and the same context object must be used for all API calls in the same thread. Creating a context object for every thread does not affect performance because the context object uses minimal resources. For example, your Java code should have the following logic in a thread: m_objFilter = new Filter(); m_objFilter.setFilterDirectory(m_filterDirectory); m_objFilter.setInputSource(infile); m_objFilter.getDocFormatInfo(); IDOL KeyView (12.13) Page 24 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started if (objFilter.canFilter() == true) m_objFilter.filterTo(outfile); m_objFilter.shutdownFilter(); Before Running Your Application Before running your application you must set the library path using one of the following methods: l On Windows, add the location of KeyViewFilter.dll to the PATH environment variable. l On Solaris, Linux, and HP-UX IA-64, add the location of libKeyViewFilter.so to the LD_ LIBRARY_PATH environment variable. l On HP-UX PA-RISC, add the location of libKeyViewFilter.sl to the SHLIB_PATH environment variable. l On AIX, add the location of libKeyViewFilter.a to the LIBPATH environment variable. l You can also specify the library path as a system property as follows: java -Djava.library.path=filter_bin_directory ... The Filter Process Model By default, Filter runs independently from the calling application process. This is called out-ofprocess filtering. Out-of-process filtering protects the stability of the calling application in the rare case when a malformed document causes Filter to fail. You can configure Filter to run in the same process as the calling application. This is called in-process filtering. However, it is strongly recommended you run Filter out of process whenever possible. The creation of child processes on UNIX usually adheres to Portable Operating System Interface (POSIX) standards. AIX uses different thread semantics. If required, a version of kvfilter with POSIX thread semantics is available for AIX. This file is kvfilter_nsl.a. It must be renamed to kvfilter.a to be used by Filter. To monitor and debug filtering operations during out-of-process filtering, you can generate an error log at run time. See Generate an Error Log, on page 55. The following methods run in process or out of process: Filter API canFilter canFilterEx IDOL KeyView (12.13) Page 25 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started doFilter doFilterChunk getSummaryInfo GetDocFormatInfo File Extraction API extCloseDocument extGetSubFileInfo extGetSubFileMetadata extExtractSubFile extGetMainFileInfo extOpenDocument getSummaryInfo KVGetExtractInterface() Other Filter API methods always run in process. Persist the Child Process By default, in out-of-process filtering, the parent process maintains a persistent connection with the child server after each file is filtered. When the connection is preserved in this way, subsequent filtering requests are processed more quickly because the server is already prepared to receive data. You can restart the server at regular intervals by using a method or a configuration setting. In the API To force KeyView to restart, call the refreshFilterKVOOP() method. public void refreshFilterKVOOP(); In the formats.ini File To control whether Filter persists the server, use the kvoopRefresh parameter in the [FilterSDK_ Config] section of the formats.ini file: kvoopRefresh=0 When this is set to 0 (zero), the connection to the server is persisted for as long as the parent process is running or until the server fails. This is the default. kvoopRefresh=n When this is set to n, the connection is persisted for n filter requests. After the nth request, the server is shutdown and restarted before processing the next request. For example, if kvooprefresh=5, the connection to the server is persisted for 5 filter requests. For the 6th request, the server is shutdown and restarted. IDOL KeyView (12.13) Page 26 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started To control whether the parent process attempts to filter a file after the file has caused the server to fail, use the kvoopRetry parameter in the [FilterSDK_Config] section of the formats.ini file: kvoopRetry=0 When this is set to 0 and the server fails, the parent process does not resend the file to a new server. kvoopRetry=n When this is set to n (a positive number) and the server fails, the parent process resends the file to a new server n times. By default, the kvoopRetry is set to 1, and the file is resent to a server once. The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. NOTE: The kvoopRefresh and kvoopRetry parameters do not apply when running the File Extraction functions out of process. See Run File Extraction Functions Out of Process, below. Run Filter In Process By default, Filter runs out of process. However, you can enable in-process filtering through the API or in the formats.ini file. If the type of process is not specified in the formats.ini or in the API, then Filter is run out of process. If the type of process is specified in the formats.ini and in the API, the setting in the API takes precedence. In the API To run Filter in process, instantiate the Filter object using the constructor Filter(java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to FILTERFLAG_ INPROCESS. objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_INPROCESS); In the formats.ini File To run Filter in process, set the following parameter in the [FilterSDK_Config] section of the formats.ini file to 1: default_inprocess=1 By default this is set to 0 (zero), which enables out-of-process filtering. The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. Run File Extraction Functions Out of Process The out-of-process setting specified when you create the Filter object or in the formats.ini is automatically propagated to the File Extraction API. When you extract subfiles from container files and pass the files for filtering out of process, Filter generates a server called kvoop.exe for filtering and a duplicate server also called kvoop.exe for file extraction. These servers are independent, so if IDOL KeyView (12.13) Page 27 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started the filtering service stops responding, the file extraction service can continue extracting files uninterrupted. Restart the File Extraction Server If the file extraction server fails on a file and throws the exception KVError_ InvalidOopDriverSignature or KVError_InvalidOopServiceSignature, you must restart the server by recreating the Filter object, and process the source file again. Out-of-Process Logging Logging is available for out-of-process filtering. The kvoop server can now create a log file that captures information on the files being processed, storing one entry per process. The generated log file is called xxxx_kvoop.log, where xxxx is a unique number identifying the process. In the rare case when the kvoop server fails, you can use the log files to determine which file caused the failure. After processing is complete and the system shuts down, the logs are automatically deleted. To keep the log files after processing is successfully completed, see Keep Log Files, on the next page. NOTE: Out-of-process logging is available only on certain platforms (see Out-of-process logging in the platform differences section). Enable Out-of-Process Logging To enable out-of-process logging, set the KVOOP_LOGS_DIR environment variable to the directory in which you want the log files to be stored. By default, logging is not enabled. On UNIX, the variable is set as follows: setenv KVOOP_LOGS_DIR /tmp On Windows, the variable is set as follows: set KVOOP_LOGS_DIR=c:\tmp The following log file is created in the directory: process_id_kvoop.log where process_id is a numeric value representing the logged process. New messages are appended to the file, and truncation is disabled by default. If KeyView terminates unexpectedly and Windows minidump is enabled, a process_id_crash_ info.txt file is generated (see Enable Windows Minidump, on the next page). If logging was not been enabled at the time of termination, this file contains instructions on how to enable logging. Set the Verbosity Level You can control how much information is written to the file by setting the KVOOP_LOG_VERBOSITY environment variable. For example: set KVOOP_LOG_VERBOSITY=1 IDOL KeyView (12.13) Page 28 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started The variable can be set to the following: 1 Include only error messages. 2 Include errors and warnings. 3 Include errors, warnings, and general information. This is the default. 4 Include all possible information. This setting is useful for debugging purposes. Enable Windows Minidump KeyView can use the Windows minidump feature to provide additional logging information, which can be useful for debugging purposes. The Windows minidump is disabled by default. To enable the Windows minidump, set KVOOP_DUMP_ ENABLE=1. If an unexpected termination occurs after the minidump is enabled, three files are generated: l process_id_crash_info.txt. This file contains KVOOP state and runtime information at the time of termination. If logging was not enabled at the time of termination, this file contains instructions on how to enable logging. l process_id_process_list.txt. This file contains information from the DLLs that were loaded at the time of the termination. l process_id_report.dmp. The Windows dump file, which contains further information about the termination. You can open it with either a Windows debugger or autnhelper.exe (you must copy this file to the same directory). You can control the amount of information presented in the Windows dump file by creating the following files in the directory: dumper.NORMAL dumper.WITHDATASEGS dumper.WITHFULLMEMORY dumper.WITHHANDLEDATA Keep Log Files After processing is complete and the system is shut down, the log files are automatically deleted from the directory. To keep the log files after a successful run, set the KVOOP_KEEP_LOGS environment variable. On UNIX, set the variable as follows: setenv KVOOP_KEEP_LOGS 1 On Windows, set the variable as follows: set KVOOP_KEEP_LOGS=1 IDOL KeyView (12.13) Page 29 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started Run File Detection In or Out of Process By default, detection runs in out-of-process mode. However, you can enable in-process detection through the API or in the formats.ini file. If the type of process is not specified in the formats.ini or in the API, Filter runs in out-of-process mode. If the type of process is specified in the formats.ini and in the API, the setting in the API takes precedence. Specify the Process Type In the formats.ini File Add the default_detect_inprocess flag to a [FilterSDK_Config] section in the formats.ini file to control the default behavior for detection. Set default_detect_inprocess to 0 for out-of-process detection, and 1 for in-process detection. For example: [FilterSDK Config] default_detect_inprocess=0 If this flag is not specified, the file detection behavior is determined by the default_inprocess flag for filtering. For example, if you set default_inprocess to 1, filtering and file detection runs in inprocess mode by default; if you set default_inprocess to 0, filtering and file detection runs in out-ofprocess mode by default. If both the default_inprocess and default_detect_inprocess flags are set, then default_ inprocess controls the default filtering behavior and default_detect_inprocess controls the default file detection behavior. Specify the Process Type In the API To run detection in in-process mode, instantiate the Filter object by using the constructor Filter (java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to FILTERFLAG_DETECTINPROCESS. To run detection in out-of-process mode, set FILTERFLAG_ DETECTOUTOFPROCESS. objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_DETECTINPROCESS); Stream Data to Filter By default, when you run Filter out-of-process, and pass file streams to the API (instead of file names), Filter uses temporary files during communication. When running out-of-process, you can configure KeyView to stream the file data while it processes it, rather than creating temporary files. This method is particularly beneficial if you do not want to process the whole file (for example, if you want to stop after filtering only some of the text, or extract only some of the subfiles). NOTE: This option is disabled by default because for some files it might result in a longer processing time when you do need to process the whole file. IDOL KeyView (12.13) Page 30 of 284 Filter SDK Java Programming Guide Chapter 2: Getting Started To turn on streaming mode, you can either: l Set at least one of the following streaming parameters in the [FilterSDK_Config] section of the formats.ini to pipe: streaming_ method filter_ streaming_ method extract_ streaming_ method Set this parameter to pipe to change the overall behavior for filtering and extraction to use streaming mode. By default this parameter is set to temp, which uses temporary files during the filter process. Set this parameter to pipe to configure filtering to use streaming mode. If you do not set this parameter, KeyView uses the value of streaming_method. Set this parameter to pipe to configure extraction to use streaming mode. If you do not set this parameter, KeyView uses the value of streaming_ method. l Set the filter streaming options in the API. The streaming method has a number of advantages: l It reduces the disk space used for temporary files. l It improves the responsiveness for partial filtering. When using thetemp_file method, your first call to filterFileToStream or filterStreamToStream does not return until the entire file has been processed. When using the pipe method, these functions return the first block of text as soon as it is available. l It reduces the I/O for partial filtering. When you use the pipe method, it might not be necessary for KeyView to read the whole input file, especially if you choose to stop filtering before all the text has returned. l For many formats, it reduces the amount of the input file that is read during extraction, especially if you extract only a subset of the files. IDOL KeyView (12.13) Page 31 of 284 Part II: Use Filter SDK This section explains how to perform some basic tasks by using the File Extraction and Filter APIs, and describes the sample programs. l Use the File Extraction API l Use the Filter API l Sample Programs IDOL KeyView (12.13) Page 32 of 284 Chapter 3: Use the File Extraction API This section describes how to extract subfiles from a container file using the File Extraction API. · Introduction 33 · Extract Subfiles 34 · Extract Images 36 · Recreate a File Hierarchy 36 · Extract Mail Metadata 38 · Extract Subfiles from Outlook Files 44 · Extract Subfiles from Outlook Express Files 44 · Extract Subfiles from Mailbox Files 44 · Extract Subfiles from Outlook Personal Folders Files 45 · Extract Subfiles from Lotus Domino XML Language Files 48 · Extract Subfiles from Lotus Notes Database Files 49 · Extract Subfiles from PDF Files 51 · Extract Embedded OLE Objects 52 · Extract Subfiles from ZIP Files 52 · Default File Names for Extracted Subfiles 53 Introduction To filter a file, you must first determine whether the file contains any subfiles (attachments, embedded OLE objects, and so on). A file that contains subfiles is called a container file. A container file has a main file (parent) and subfiles (children) embedded in the main file. The following are examples of container files: l Archive files such as ZIP, TAR, and RAR. l Mail messages such as Outlook (MSG) and Outlook Express (EML). l Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox (MBX), and Lotus Notes database (NSF). l PDF files that contain file attachments. l Compound documents with embedded OLE objects such as a Microsoft Word document with an embedded Excel chart. NOTE: Document Readers, on page 177 indicates which formats are treated as container files and are supported by the File Extraction API. IDOL KeyView (12.13) Page 33 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API The subfiles might also be container files, creating a file hierarchy of multiple levels. For example, an MSG file (the root parent) might contain three attachments: l a Microsoft Word document that contains an embedded Microsoft Excel spreadsheet. l an AutoCAD drawing file (DWG). l an EML file with an attached Zip file, which in turn contains four archived files. NOTE: The parent MSG file contains four first-level children. The body text of a message file, although not a standalone file in the container, is considered a child of the parent file. Extract Subfiles To filter all files in a container file, the container must be opened and its subfiles extracted to either a file or a stream using the File Extraction API. The extraction process is done repeatedly until all subfiles are extracted and exposed for filtering. Once a subfile is extracted, you can call Filter API methods to filter the data. If you require a container file, including subfiles, to be filtered to a single file, you must extract all files from the container, filter the files, and then append each filtered output to its parent. To extract subfiles, follow this general procedure 1. Open the source file by calling the extOpenDocument method. This call defines the parameters necessary to open a file for extraction. 2. Determine whether the main file is a container file (contains subfiles) by calling the extGetMainFileInfo() method. IDOL KeyView (12.13) Page 34 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API 3. If the call to extGetMainFileInfo() determined the source file is a container file, proceed to Step 4; otherwise, filter the file. 4. Determine whether the subfile is itself a container (contains subfiles) by calling the extGetSubFileInfo method. 5. Extract the subfile by calling the extExtractSubFile method. 6. If the call to extGetSubFileInfo determined the subfile is a container file, repeat Step 1 through Step 5 until all subfiles are extracted and the lowest level of subfiles is reached; otherwise, filter the file. Sanitize Absolute Paths When you extract a subfile from a container and write it to disk, you specify an extract directory and a path to extract the file to. To set the path, you might use the path in the container file that you are extracting from, as returned from the Filter.extGetSubFileInfo() method. However, if the path is an absolute path, the file could be created outside the directory you have chosen as the extract directory. Your application might then contain a vulnerability that could be exploited to write files to unexpected locations in the file system. This section discusses some KeyView features that can help you secure your application by sanitizing paths. KeyView always sanitizes relative paths that you pass in when extracting files, so that the paths remain within the extract directory you specify. For example, KeyView does not allow the use of ".." to move outside the extract directory. KeyView can update absolute paths so that they remain within the extract directory. You can instruct KeyView to sanitize absolute paths programmatically (through the API), or by setting a parameter in the configuration file. The following table shows the effect on some example paths. Requested path Path of extracted file (not sanitized) Path of extracted file (sanitized) file.txt extractDir/file.txt extractDir/file.txt dir/file.txt extractDir/dir/file.txt extractDir/dir/file.txt ../file.txt extractDir/file.txt extractDir/file.txt /dir/file.txt /dir/file.txt extractDir/dir/file.txt To sanitize absolute paths l Call the method setSanitizeAbsolutePaths on the ExtSubFileExtractConfig that you pass in to extExtractSubFile. When KeyView sanitizes a path and the resulting directory does not exist, extraction fails unless you instruct KeyView to create the directory, so you might also want to call the method setCreateDirectory. You can find the path that a file was actually extracted to from the ExtSubFileExtractInfo object that is returned from the extExtractSubFile method. IDOL KeyView (12.13) Page 35 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API To sanitize absolute paths (through configuration) l In the formats.ini configuration file, set the parameter SanitizeAbsoluteExtractPaths, for example: [Options] SanitizeAbsoluteExtractPaths=TRUE Extract Images You can use the File Extraction API to extract images within a file. If you use this feature, images within the file behave in the same way as any other subfile. Extracted images have the name image[X].[Y], where [X] is an integer, and [Y] is the extension. The format of the image is the same as the format in which it is stored in the document. NOTE: Turning on ExtractImages can reduce the speed of the filtering operation. To extract images l In the Java API, call the setExtractImages method on the filter object, for example: filter.setExtractImages(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] ExtractImages=TRUE Recreate a File Hierarchy When a container file is extracted, any relationships between the subfiles in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. The hierarchy can be used to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document's parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations. The information needed to recreate a file's hierarchy is provided in the call to extGetSubFileInfo. Call this method to retrieve an object of the ExtSubFileInfo class, then use the getParentIndex() and getChildArray() methods in this object to retrieve information about the subfile's parent and children. Since you can only retrieve the first-level children in a subfile, you must call extGetSubFileInfo repeatedly until information for the leaf-node children is extracted. IDOL KeyView (12.13) Page 36 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Create a Root Node Because of their structure, some container files do not contain a subfile or folder which acts as a root directory on which the hierarchy can be based. For example, subfiles in a Zip archive can be extracted, but none of the subfiles represent the root of the hierarchy. In this case, an artificial root node must be created at the top of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called root. Its index number is 0. To create a root node, call the setCreateNode method in the ExtOpenDocConfig object, and pass ExtOpenDocConfig to the extOpenDocument method. When a root node is created, the value returned from the getNumSubFiles method in the ExtMainFileInfo object includes the root node. For example, when you call extGetMainFileInfo on a Microsoft Word document with three embedded OLE objects and the root node is disabled, the number of subfiles is 3. If you create a root node, the number of subfiles is 4. Example For example, you might extract a PST file that contains seven subfiles with a root node enabled. The call to extGetMainFileInfo() returns the number of subfiles as 8 (seven subfiles and one root node). The following diagram shows the structure and the available hierarchy information after the subfiles are extracted: Extracted PST File The parentIndex specifies the index number of a subfile's parent. The childArray specifies an array of a subfile's children. With this information, you can recreate the hierarchy shown in the following diagram: IDOL KeyView (12.13) Page 37 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Recreated File Hierarchy Extract Mail Metadata You can extract metadata such as subject, sender, and recipient from subfiles of mail formats by calling the extGetSubFileMetadata() method. You can extract a predefined set of metadata fields, or a list of metadata fields by their names or MAPI properties. Default Metadata Set KeyView internally defines a set of common mail metadata fields that can be extracted as a group from mail formats. This default metadata set is listed in the following table. Default Mail Metadata List Field Name (string to specify) From Sent To Cc Bcc Description The display name and email address of the sender. The time the message was sent. The display names and email addresses of the recipients. The display names and email addresses of recipients who receive copies of the email. The display names and email addresses of recipients who received blind copies of the email. IDOL KeyView (12.13) Page 38 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Default Mail Metadata List, continued Field Name (string to Description specify) Subject Priority The text in the subject line of the message. The priority applied to the message. Because mail formats use different terms for the same fields, the format's reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned. You can also extract the default field names individually by passing the field name (such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you will retrieve the contents of the Priority field from an MBX file, but will not retrieve the contents of the Importance field from an NSF file. NOTE: You cannot pass the field names listed in MSG-Specific Metadata List, on the next page individually for PST files. However, you can pass either the MAPI tag number or one of the constants in the Filter class as integers. See Microsoft Personal Folders File (PST) Metadata, on page 42. Extract the Default Metadata Set To extract the default metadata set, call the extGetSubFileMetadata(long docContextID, int nSubFileIndex, ExtSubFileMetaConfig config) method. For example: ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig(); ExtSubFileMetadata subfilemeta = null; subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaConfig); Extract All Metadata KeyView can extract all metadata from subfiles of MSG, EML, MBX, MIME, NSF, ICS, and DXL mail containers. To extract all metadata, call the setAllMetadata() method of the ExtSubFileMetaConfig object, and pass ExtSubFileMetaConfig to the extGetSubFileMetadata method. For example: config = new ExtSubFileMetaConfig(); config.setAllMetadata(true); subFileMetadata = export.extGetSubFileMetadata(extContextID, i, config); IDOL KeyView (12.13) Page 39 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Microsoft Outlook (MSG) Metadata In addition to the default metadata set, the metadata fields listed in the following table can be extracted for MSG files. The field name must be passed to metaNameArray in the call to the extGetSubFileMetadata() method. MSG-Specific Metadata List Field Name (string to specify) AttachFileName ConversationTopic CreationTime InternetMessageID LastModificationTime Location MessageID Received Sender Sensitivity TransportMsgHeaders Description An attachment's long file name and extension, excluding path. The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies. This is the first message's subject with any prefix removed. The time the message or attachment was created. This value is displayed in the Sent field in the message's Properties dialog in Outlook. The identifier for messages that come in over the Internet. This is the MAPI property PR_INTERNET_MESSAGE_ID. This property is not in the MAPI headers or MAPI documentation. The time the message or attachment was last modified. This value is displayed in the Modified field in the message's Properties dialog in Outlook. The physical location of the event specified in the Outlook calendar entry. The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message's Properties dialog in Outlook. The date and time a message was delivered. This value is displayed in the Received field in the message's Properties dialog in Outlook. The name and email address of the message sender. This value is a concatenation of two MAPI properties in the following format: "PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS> The Sender value might be the same as or different than the default metadata From value (see Default Metadata Set, on page 38), depending on which MAPI properties exist in the MSG file. The value indicating the message sender's opinion of the sensitivity of a message, such as Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message's Properties dialog in Outlook. Contains transport-specific message envelope information. This value IDOL KeyView (12.13) Page 40 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API MSG-Specific Metadata List, continued Field Name (string to specify) Description StartDate EndDate corresponds to the MAPI property PR_TRANSPORT_MESSAGE_HEADERS. Contains an appointment start date. This value corresponds to the PR_ START_DATE MAPI property. Contains an appointment end date. This value corresponds to the PR_ END_DATE MAPI property. Extract MSG-Specific Metadata To extract specific metadata fields from an MSG file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the field name defined in MSG-Specific Metadata List, on the previous page to metaNameArray (the string is not case sensitive). For example, the following code extracts the contents of the ConversationTopic and MessageID fields: ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig(); ExtSubFileMetadata subfilemeta = null; String[] metaNameArray = {"conversationtopic", "MessageID"}; subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig); Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field's name. If the name is a valid field in the file, the contents of the field are returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string "Received". Extract EML- or MBX-Specific Metadata To extract specific metadata fields from an EML or MBX file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the metadata name to metaNameArray (the string is not case sensitive). For example, the following code extracts the contents of the Received and Mime-version fields: ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig(); ExtSubFileMetadata subfilemeta = null; IDOL KeyView (12.13) Page 41 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API String[] metaNameArray = {"Received", "Mime-version"}; subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig); Lotus Notes Database (NSF) Metadata In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field's name. (You can extract fields from mail NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date a document in an NSF file was last accessed, you would pass the string "$LastAccessedDB". NOTE: A complete list of NSF fields are provided in the Lotus Notes file stdnames.h. This header file is available in the Lotus API Toolkit. Extract NSF-Specific Metadata To extract specific metadata fields from an NSF file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the metadata name to metaNameArray (the string is not case sensitive). For example, the following code extracts the contents of the Description and Categories fields: ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig(); ExtSubFileMetadata subfilemeta = null; String[] metaNameArray = {"description", "Categories"}; subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig); Microsoft Personal Folders File (PST) Metadata In addition to the default metadata set, you can extract Messaging Application Programming Interface (MAPI) properties from a PST file. These properties describe elements (subject, sender, recipient, and so on) of Outlook items within the PST file. Since the properties are stored in the PST file itself, they can be retrieved before the contents of the PST are extracted. This enables you to determine whether an Outlook item should be extracted based on a subfile's attributes. MAPI properties are also stored for Outlook attachments that are not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file). MAPI Properties Each MAPI property is identified by a property tag, which is a constant that contains the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components: IDOL KeyView (12.13) Page 42 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Property PR_HASATTACH Identifier 0x0E1B Property type PT_BOOLEAN (000B) Property tag 0x0E1B000B The Microsoft MAPI documentation on the Microsoft Developer Network website lists all available MAPI properties, their tags, and types. You can retrieve any MAPI property that is of one of the MAPI property types listed below: PT_I2 PT_DOUBLE PT_STRING8 PT_I4 PT_FLOAT PT_TSTRING PT_BINARY PT_LONG PT_SYSTIME PT_BOOLEAN PT_SHORT PT_UNICODE NOTE: Properties with a PT_TSTRING type have the property type recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string (PT_STRING8) depending on the operating system's character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT is either 0x0037001E for an ANSI string, or 0x0037001F for a Unicode string. Extract PST-Specific Metadata In the call to extract subfile metadata, you can pass either the MAPI tag number (such as 0x0070001e) or one of the constants in the Filter class (such as KVPR_SUBJECT). These constants are a subset of MAPI properties and use a KeyView naming convention. For example, the property PR_ CONVERSATION_TOPIC is defined as KVPR_CONVERSATION_TOPIC. If the property you want to retrieve is not defined as a constant in the Filter class, you must pass the MAPI tag number. To extract specific MAPI properties from a PST file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, int[] metaNameArray, ExtSubFileMetaConfig config) and pass the tag number or constant to metaNameArray. For example, the following code extracts the MAPI properties PR_SUBJECT and PR_ALTERNATE_ RECIPIENT: ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig(); ExtSubFileMetadata subfilemeta = null; int[] metaNameArray = {Filter.KVPR_SUBJECT, 0x3A010102}; subfilemeta = m_objFilter.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig); IDOL KeyView (12.13) Page 43 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Exclude Metadata from the Extracted Text File When a mail message is extracted, the message text and header information (To, From, Sent, and so on) is also extracted. You can prevent the header information from appearing in the text file. To exclude the header information, call the setExcludeMailHeader() method of the ExtSubFileExtractConfig object, and pass ExtSubFileExtractConfig to the extExtractSubFile method. For example: m_excludeMailHeader = true; extconfig = new ExtSubFileExtractConfig(); extconfig.setExcludeMailHeader(m_excludeMailHeader); extinfo = m_objFilter.extExtractSubFile(extContextID, i, extconfig); Extract Subfiles from Outlook Files When you extract an Outlook file (MSG) to disk, the message text and header information (To, From, Sent, and so on) is extracted to a text file. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, above.) If the Outlook file contains a non-mail attachment, the attachment is extracted in its native format to a subdirectory. If the Outlook file contains a mail attachment, the attachment's message text and any attachments are extracted to a subdirectory. Extract Subfiles from Outlook Express Files When you extract an Outlook Express (EML) file to disk, the message text and header information (To, From, Sent, and so on) is extracted to a text file. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, above.) If the Outlook Express file contains a non-mail attachment, the attachment is extracted in its native format to the same directory as the message text file. If the Outlook Express file contains a mail attachment, the complete attachment (including message text and attachments), the message text file, and any nonmail attachments are extracted to the same directory as the main message. NOTE: When the MBX reader (mbxsr) is enabled, it is used to filter MBX and EML files. If the MBX reader is not enabled, the EML reader (emlsr) is used. Extract Subfiles from Mailbox Files A Mailbox (MBX) file is a collection of individual emails compiled with RFC 822 and RFC 2045 - 2049 (MIME), and divided by message separators. There are many mail applications that export to an MBX format, such as Eudora Email and Mozilla Thunderbird. IDOL KeyView (12.13) Page 44 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API When an MBX file is extracted to disk, the message text and header information (To, From, Sent, and so on) from each mail file are extracted to text files. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, on the previous page.) In Eudora MBX files, attachments are inserted as a link and are stored externally from the message. These attachments are not extracted, but the path to the attachment is returned in the call to the extGetSubFileInfo method. You can write code to retrieve the attachment based on the returned path. For MBX files from other clients, KeyView extracts attachments when they are embedded in the message. NOTE: The Mailbox (MBX) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from Micro Focus. Extract Subfiles from Outlook Personal Folders Files KeyView can extract Outlook items such as messages, appointments, contacts, tasks, notes, and journal entries from a PST file. When a PST file is extracted to disk, the body text and header information (To, From, Sent, and so on) from each Outlook item is extracted to a text file. (If you do not want the header information to appear in the text file, see Exclude Metadata from the Extracted Text File, on the previous page.) You can also extract messages from PST files as MSG files, including all their attachments, using the setSaveAsMSG() method in the ExtSubFileExtractConfig class. If an Outlook item contains a non-mail attachment, the attachment is extracted in its native format to a subdirectory. If an Outlook item contains an Outlook attachment, the attached item's body text and any attachments are extracted to a subdirectory. NOTE: The Microsoft Outlook Personal Folders (PST) readers are an advanced feature and are sold and licensed separately. To enable these readers in a KeyView SDK, you must obtain an appropriate license key from Micro Focus. For information about adding a new license key to an existing installation, see Pass License Information to KeyView, on page 18. Choose the Reader to use for PST Files KeyView provides several ways of processing PST files: l Indirectly, using the Microsoft Messaging Application Programming Interface (MAPI). MAPI is a Microsoft interface that enables different applications to exchange messages and attachments with each other. MAPI allows KeyView to open a PST file, traverse the folders, and extract items. The pstsr reader uses MAPI, but works only on Windows and requires that Microsoft Outlook is installed. IDOL KeyView (12.13) Page 45 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API l Directly, without relying on the Microsoft interface to the PST format. Accessing the file directly does not require Microsoft Outlook. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section). The pstnsr reader is an alternative native reader, for the platforms not supported by pstxsr. On Windows, the MAPI-based reader is used by default but you can choose pstxsr if you prefer. On non-Windows platforms, only one of the native readers is available. The differences between the readers are summarized in the following table. Feature Platforms supported Outlook required MAPI properties supported Password protection supported Compressible encryption supported High encryption supported Native Reader (pstxsr) Native Reader (pstnsr) MAPI-based Reader (pstsr) Windows x86 and x64 Linux x64 and AArch64 All platforms not supported by pstxsr Windows x86 and x64 No No Yes Yes. All properties defined in mapitags.h. Object properties are not supported. Yes Yes Yes (using KVCredential structure) Yes Yes Yes No No Yes To change the reader used to process PST files, change the PST entry (file category value 297) in the formats.ini file. For example, to use pstxsr: 297=pstx NOTE: You must make sure that the PST that you are extracting is not open in the Outlook client, and that the Outlook process is not running. NOTE: When extracting subfiles from PST files, information on the distribution list used in an email is extracted to a file called emailname.dist. This applies to the MAPI reader (pstsr) only. System Requirements MAPI is supported on Windows platforms only and relies on functionality in Outlook. If you want to use the MAPI-based reader, pstsr, Microsoft Outlook must be installed on the same machine as your application. Outlook must also be the default email application. KeyView supports the following PST formats and Outlook clients: IDOL KeyView (12.13) Page 46 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API l Outlook 97 or later PST files NOTE: The Outlook client must be the same version as, or newer than, the version of Outlook that generated the PST file. l Outlook 2002 or later clients NOTE: You must install an edition of Microsoft Outlook (32-bit or 64-bit) that matches the KeyView software. For example, if you use 32-bit KeyView, install 32-bit Outlook. If you use 64-bit KeyView, install 64-bit Outlook. If the editions do not match, KeyView returns Error 32: KVError_PSTAccessFailed and an error message from Microsoft Office Outlook is displayed: Either there is a no default mail client or the current mail client cannot fulfill the messaging request. Please run Microsoft Outlook and set it as the default mail client. MAPI Attachment Methods The way in which you can access the contents of a PST message attachment is determined by the MAPI attachment method applied to the attachment. For example, if the attachment is an embedded OLE object, it uses the ATTACH_OLE attachment method. KeyView can access message attachments that use the following attachment methods: ATTACH_BY_VALUE ATTACH_EMBEDDED_MSG ATTACH_OLE ATTACH_BY_REFERENCE ATTACH_BY_REF_ONLY ATTACH_BY_REF_RESOLVE Attachments using the ATTACH_BY_VALUE, ATTACH_EMBEDDED_MSG, or ATTACH_OLE attachment methods are extracted automatically when the PST file is extracted. An "attach by reference" method means that the attachment is not in Outlook, but Outlook contains an absolute path to the attachment. Before you can extract these types of attachments, you must retrieve the path to access the attachment. To extract "attach by reference" attachments 1. Determine whether the attachment uses an ATTACH_BY_REFERENCE, ATTACH_BY_REF_ONLY, or ATTACH_BY_REF_RESOLVE method by retrieving the MAPI property PR_ATTACH_METHOD. 2. If the attachment uses one of the "attach by reference" methods, get the fully qualified path to the attachment by retrieving the MAPI properties PR_ATTACH_LONG_PATHNAME or PR_ATTACH_ PATHNAME. 3. You can then either copy the files from their original location to the path where the PST file is extracted, or use the Filter API methods to filter the attachment. IDOL KeyView (12.13) Page 47 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Open Secured PST Files KeyView enables you to specify credentials (user name and password), which are used to open a secured PST file for extraction. See Password Protected Files, on page 273 for more information. Detect PST Files While the Outlook Client is Running If you are running an Outlook client while running the File Extraction API, the KeyView format detection module (kwad) might not be able to open the PST file to determine the file's format because Outlook has the file locked. In this case, you can do one of the following: l Close Outlook when using the Extraction API l Detect PST files by extension only and bypass the format detection module. To enable this option, add the following lines to the formats.ini file. [container_flags] detectPSTbyExtension=1 NOTE: The detectPSTbyExtension option only applies when you are using the MAPI reader (pstsr). NOTE: If you use this option, you must make sure in your code that valid PST files are passed to KeyView because the format detection module will not be available to verify the file type and pass the file to the appropriate reader. Extract Subfiles from Lotus Domino XML Language Files When you extract a Lotus Domino XML Language (.DXL) file, the message text and header information (To, From, Sent, and so on) is extracted to a text file. NOTE: To prevent header information from being extracted, see Exclude Metadata from the Extracted Text File, on page 44. You can make sure that dates and times extracted from Lotus Domino .DXL files are displayed in a uniform format. To extract custom date/time formats l In the formats.ini file, set the DateTimeFormat option in the [dxlsr] section. For example: [dxlsr] DateTimeFormat=%m/%d/%Y %I:%M:%S %p In this example, dates and times are extracted in the following format: IDOL KeyView (12.13) Page 48 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API 02/11/2003 11:36:09 AM The format arguments are the same as those for the strftime() function. See http://msdn.microsoft.com/en-us/library/fe06s4ak%28VS.71%29.aspx for more information. Extract .DXL Files to HTML You can use the file extraction API to process .DXL files with an XSLT engine. The XSLT engine then transforms the extracted .DXL to .mail HTML files. To extract .DXL files to HTML l Set the following options in the formats.ini file: [nsfsr] ExportDXL=1 ExportDXL_PureXML=1 [dxlsr] LNDParser=2 Extract Subfiles from Lotus Notes Database Files A Lotus Notes database is a single file that contains multiple documents called notes. Notes include design notes (such as forms, views, folders, navigators, outlines, pages, framesets, agents, and resources), data document notes, profile document notes, access control list notes, and collection (index) notes. KeyView can extract text items, attachments, and OLE objects from data document notes only. Data document notes include emails, journal entries, discussion threads, documents (Microsoft Office and Lotus SmartSuite), and so on. All components of a note are prefixed by field names such as "SendTo:", "Subject:", and "Body:". When a note is extracted, the field names are not included in the extracted output; only the field values are extracted. When a mail message in an NSF file is extracted to disk, the body text and header information (such as the values from the SendTo, From, and DeliveredDate fields) in each message is extracted to a text file. (If you do not want the header information to appear in the message text file, see Exclude Metadata from the Extracted Text File, on page 44.) NOTE: The Lotus Notes Database (NSF) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from Micro Focus. IDOL KeyView (12.13) Page 49 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API System Requirements The Lotus Notes Database (NSF) reader is available only on certain platforms (see nsfsr in the platform differences section). KeyView accesses NSF files indirectly by using the Lotus Notes API. Because the NSF reader relies on functionality in Lotus Notes, a Notes client or Domino server must be installed and configured on the same machine as KeyView. On UNIX and Linux, the Domino server is required. On Windows, the Notes client or Domino server is required. For information about the supported versions of Notes or Domino, see Software Dependencies, on page 14. Installation and Configuration Before KeyView can filter NSF files, you must set up the Notes client or Domino server. Full configuration is not required. The following steps outline the minimal setup for NSF filtering: Windows 1. Install the Lotus Notes client or Lotus Domino server. You do not need to configure the client or server. 2. Make sure that the notes.ini file is in the proper location. l If Lotus Notes is installed, the file should appear in the install\lotus\notes directory, where install is the installation directory. l If only Lotus Domino is installed, the file should appear in the install\lotus\domino directory, where install is the installation directory. If the file does not exist, create an ASCII file named notes.ini, and add the following text: [Notes] 3. Add the KeyView bin directory and the install\lotus\notes or install\lotus\domino directory to the PATH environment variable (the KeyView bin directory must be first in the path). Micro Focus recommends that you add the KeyView bin directory because the Lotus Notes or Domino server installation might contain older KeyView OEM libraries. Linux 1. Install Lotus Domino server. You do not need to configure the server. 2. Make sure that the notes.ini file is in the install/lotus/notes/latest/linux directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text: [Notes] 3. Add the install/lotus/notes/latest/linux directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/linux:$PATH 4. Add the install/lotus/notes/latest/linux and the KeyView bin directory to the LD_ LIBRARY_PATH environment variable: IDOL KeyView (12.13) Page 50 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/linux:$LD_LIBRARY_ PATH where keyview_bin is the location of the KeyView bin directory. Micro Focus recommends that you add the KeyView bin directory because the Lotus Notes installation might contain older KeyView OEM libraries. AIX 5.x 1. Install the bos.iocp.rte file set if it is not already installed, and reboot the machine. See the Lotus Domino server documentation for more information. 2. Install Lotus Domino server. You do not need to configure the server. 3. Make sure that the notes.ini file is in the install/lotus/notes/latest/ibmpow directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text: [Notes] 4. Add the install/lotus/notes/latest/ibmpow directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/ibmpow:$PATH 5. Add the install/lotus/notes/latest/ibmpow and the KeyView bin directory to the LIBPATH environment variable: setenv LIBPATH keyview_bin:install/lotus/notes/latest/ibmpow:$LIBPATH where keyview_bin is the location of the KeyView bin directory. Micro Focus recommends that you add the KeyView bin directory because the Lotus Notes installation might contain older KeyView OEM libraries. Open Secured NSF Files KeyView enables you to specify a user ID file and password to use to open a secured NSF file for extraction. See Password Protected Files, on page 273 for more information. Format Note Subfiles The KeyView NSF reader uses XML templates to format note subfiles. You can customize the templates to approximate the look and feel of the original notes as closely as possible. For more information, see Extract and Format Lotus Notes Subfiles, on page 225. Extract Subfiles from PDF Files KeyView can extract document-level and page-level attachments from a PDF document. Documentlevel attachments are added by using the Attach A File tool, and can include links to or from the parent document or to other file attachments. Page-level attachments are added as comments by IDOL KeyView (12.13) Page 51 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API using various tools. Page-level or comment attachments display the File Attachment icon or the Speaker icon on the page where they are located. KeyView can also extract the files from Portfolio PDFs. When a PDF file is extracted to disk, the PDF file is extracted to a directory and the PDF's attachments are saved in their native format to the same directory as the original PDF file. Improve Performance for PDFs with Many Small Images To improve performance when processing PDF files that contain many small images, you can choose to ignore images unless they exceed a minimum width and/or height. If an image is smaller than the minimum width or height, KeyView does not extract the image. For example, to ignore images that are less than 16 pixels wide or less than 16 pixels in height, add the following to the [pdf_flags] section of the formats.ini file: [pdf_flags] process_images_with_min_width=16 process_images_with_min_height=16 Extract Embedded OLE Objects The File Extraction API can extract embedded OLE objects from the following types of documents: l Lotus Notes (DXL) l Microsoft Excel l Microsoft Word l Microsoft PowerPoint l Microsoft Outlook l Microsoft Visio l Microsoft Project l OASIS Open Document l Rich Text Format (RTF) When an embedded OLE object is extracted from its parent file, the location of the embedded file in the original document is not available. The parent and child are extracted as separate files. Extract Subfiles from ZIP Files You can extract ZIP files that are not password-protected by using the general method (see Extract Subfiles, on page 34). However, some ZIP files use password protection, in which case you must use IDOL KeyView (12.13) Page 52 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API a different method to enter the required credentials. See Password Protected Files, on page 273 for more information. Default File Names for Extracted Subfiles When a file name is not specified in the call to extExtractSubFile, in some cases, a default file name is applied to the extracted subfile. Default File Name for Mail Formats To avoid naming conflicts and problems with long file names, KeyView applies its own names to the extracted mail folders and mail items when a name is not supplied in the call to extExtractSubFile. A non-mail attachment retains its original file name and extension. When the contents of a mail store or the message body of a mail message are extracted, the extracted file names might include the following: l The first valid eight characters of the original folder name or "Subject" line of the mail message. If the "Subject" line is empty, the characters kvext are used, where ext is the format's extension. For example, the characters would be "kvmsg" for MSG, and "kvnsf" for NSF. The following special characters are considered invalid and are ignored: any non-printing character with a value less than 0x1F angle brackets (< >) double quotation mark (") asterisk (*) forward slash (/) back slash (\) pipe (|) colon (:) question mark (?) For notes, the file name is derived from the first 24 characters of the note text. For contact entries, the file name is derived from the full name of the contact. l The characters _kvn, where n is an integer incremented from 0 for each extracted item. l One of the following extensions: Type File Extension email message .mail calendar appointment .cal contact entry .cont task entry .task note .note IDOL KeyView (12.13) Page 53 of 284 Filter SDK Java Programming Guide Chapter 3: Use the File Extraction API Type journal entry distribution list posting note File Extension .jrnl .dist .post If the type cannot be determined for an MSG or PST file, the file is given a .mail extension. If the type cannot be determined for an NSF file, the file is given a .tmp extension. For example, an MSG mail message with the subject line "RE: Product roadmap" that contains the Microsoft Excel attachment release_schedule.xls is extracted as RE produ_kv0.mail release_schedule.xls If an extracted message contains an embedded OLE object or any attachment that does not have a name, the object or attachment is extracted as _kv#.tmp. Default File Name for Embedded OLE Objects KeyView can apply a default name to an extracted embedded OLE object when a name is not supplied in the call to extExtractSubFile. When an embedded OLE object is extracted, the extracted file name might include the following: l The first valid eight characters of the main file. The following special characters are considered invalid and are ignored: any non-printing character with a value less than 0x1F angle brackets (< >) double quotation mark (") asterisk (*) forward slash (/) back slash (\) pipe (|) colon (:) question mark (?) l The characters _kvn, where n is an integer incremented from 0 for each extracted object. l If KeyView can determine the embedded OLE is a Microsoft Office document, the original extension is used. If the file type cannot be determined, the file is given a .tmp extension. For example, let us say a Microsoft Word document (sales_quarterly.doc) contains two embedded OLE objects: a Microsoft Excel file called west_region.xls, and a bitmap created in the Word document. The embedded objects would be extracted as sales_qu_kv0.xls sales_qu_kv1.tmp IDOL KeyView (12.13) Page 54 of 284 Chapter 4: Use the Filter API This section describes how to perform some basic filtering tasks by using the Filter API. · Generate an Error Log 55 · Extract Metadata 59 · Convert Character Sets 62 · Extract Tracked Deleted Text 64 · Filter PDF Files 64 · Filter Spreadsheet Files 71 · Filter Presentation Files to a Logical Reading Order 75 · Filter HTML Files 76 · Filter XML Files 76 · Configure Headers and Footers 81 · Error Messages 81 · Tab Delimited Output for Spreadsheets and Embedded Tables 84 · Exclude Japanese Guide Text 85 · Source Code Identification 85 · Optical Character Recognition 86 · Configure the Proxy for RMS 88 · Document Restrictions 89 Generate an Error Log You can monitor and debug filtering operations by enabling a detailed error log. This allows you to see errors that are generated at run time and to track problem files in stream or file mode. NOTE: Error logs are not generated when in-process filtering is enabled. The error log might include the following information: l Generated error messages. l Time stamp. l Path and file name of the file in which the error occurred. l Length of the file in which the error occurred. If the name of the original file or the name of the temporary file are not obtained in stream mode, the file length is reported. The following is a sample log file: IDOL KeyView (12.13) Page 55 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API -KVOOPE 12 # Time: 11:14:32 # File Len = 68140 -KVOOPE 13 # Time: 11:23:05 # H:\files\WP\Word97\fnldmsa.doc -KVOOPE 5 # Time: 12:15:54 # H:\files\SS\XL2000\corporate.xsl -KVOOPE 5 # Time: 12:45:19 # H:\files\WP\WPerf5\wp501.doc -KVOOPE 12 # Time: 14:25:33 # H:\files\PG\PPoint95\95.ppt -KVOOPE 26 # Time: 16:26:04 # File Len = 19117568 -KVOOPE 10 # Time: 20:27:40 # File Len = 19117568 You can specify the information that is written to the log file using either the API or environment variables. To configure a log file for a single filtering session, use environment variables. To configure a log file for all filtering sessions, use the API. Configuring the log file using the API overrides the same settings in the environment variables. You can also specify additional settings in the formats.ini file You can configure the following features of the log file: l Enable or disable logging. See Enable or Disable Error Logging, below. l Change the default path and file name of the log file. See Change the Path and File Name of the Log File, on the next page. l Include memory errors in the log file. See Report Memory Errors, on the next page. l Specify a memory guard that is used to generate memory overwrite errors in the log. See Specify a Memory Guard, on the next page. l Include the input file name in the log file when filtering a stream. See Report the File Name in Stream Mode, on page 58. l Specify the maximum size of the log file. See Specify the Maximum Size of the Log File, on page 58. Enable or Disable Error Logging You can enable or disable error logging using either the API or environment variables. By default, a file called kvoop.log is created in the system temporary directory; however, you can change the path and file name of this file (see Change the Path and File Name of the Log File, on the next page). Use the API To enable or disable logging in the API, instantiate the Filter object using the constructor Filter (java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to either FILTERFLAG_OOPLOGON or FILTERFLAG_OOPLOGOFF. For example: objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_OOPLOGON); Use Environment Variables To enable logging, add the environment variable KVOOPLOGON, and set the variable value to 1. To disable logging, do not set the environment variable KVOOPLOGON. IDOL KeyView (12.13) Page 56 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Change the Path and File Name of the Log File You can change the default path and file name of the log file. The default is C:\temp\kvoop.log on Windows and /tmp/kvoop.log on UNIX. To change the path and file name of the log file, add the following to the formats.ini file: [kvooplog] KvoopLogName=filepath The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. Report Memory Errors You can report memory leaks and memory overwrites in the log file by enabling the memory trace system, either by using the API or environment variables. If the memory trace system is enabled, the error messages for memory leaks and memory overwrites (KVError_MemoryLeak and KVError_ MemoryOverwrite, respectively) are reported in the log file when they are generated. The error messages are listed in Error Messages, on page 81. NOTE: To report memory overwrites, you must also set a memory guard. See Specify a Memory Guard, below. Use the API To enable or disable the memory trace system in the API, instantiate the Filter object using the constructor Filter(java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to either FILTERFLAG_OOPMEMTRACEON or FILTERFLAG_OOPMEMTRACEOFF. For example: objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_OOPMEMTRACE); Use Environment Variables To enable the memory trace system, add the KVOOPMT environment variable, and set its value to 1. To disable the memory trace system, do not set the KVOOPMT environment variable . Specify a Memory Guard To report memory overwrites in the log file, you must set a memory guard that protects against memory overwrites. Normally, this is set in the range of 100-200 bytes. For example, if a memory guard of 100 is set and 20 bytes of memory are specified, a total of 120 bytes of memory are allocated. The additional memory is used to monitor and identify memory overwrites. To configure the memory guard, add the following section to the formats.ini file: [Kvooplog] mg=100 IDOL KeyView (12.13) Page 57 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Report the File Name in Stream Mode When you run Filter in file mode, the file name is always reported in the log file. To report the file name in stream mode, you must extract it through the API. To add the input file name to the log 1. Create an instance of ConfigOption with the following arguments: a. Set the OptionType to CFG_SETOOPSRCFILE. b. Set the OptionValue to 0. c. Set OptionData to the input_filename. 2. Call the setConfigOption method, and pass in the ConfigOption instance. Example if((filterFlags & Filter.FILTERFLAG_OOPLOGON) == Filter.FILTERFLAG_OOPLOGON) { ConfigOption config = new ConfigOption(Filter.CFG_SETOOPSRCFILE, 0, inFile); objFilter.setConfigOption(config); } Specify the Maximum Size of the Log File You can specify the maximum size of the log file. When this size is reached and new entries are logged, either the first entry in the file is overwritten or the new entries are not reported. To configure the maximum log size and whether old entries are overwritten, add the following section to the formats.ini file: [Kvooplog] LogFileSize=10 OverWriteLog=1 Option LogFileSize OverWriteLog Description This option specifies the maximum size of the log file in KB. The minimum is 1 K. If a size is not specified, the default 2 MB is used. This option determines whether the log file is overwritten when the maximum log file size (LogFileSize) is reached. If you set this option to 1, the first entry in the log file is overwritten. If you set this option to 0, new entries are not reported in the log file. IDOL KeyView (12.13) Page 58 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Extract Metadata When a file format supports metadata, KeyView can extract and process that information. Metadata includes document information fields such as title, author, creation date, and file size. Depending on the file's format, metadata is referred to in a number of ways: for example, "summary information," "OLE summary information," "file information," and "document properties." The metadata in mail formats (MSG and EML) and mail stores (PST, NSF, and MBX) is extracted differently than other formats. For information on extracting metadata from these formats, see Extract Mail Metadata, on page 38. NOTE: KeyView can only extract metadata from a document if metadata is defined in the document, and if the document reader can extract metadata for the file format. The section Document Readers, on page 177 lists the file formats for which metadata can be extracted. KeyView does not generate metadata automatically from the document contents. The sample program FilterTest demonstrates how to extract metadata. See Sample Programs, on page 91. Extract Metadata for File Filtering To extract metadata for file filtering 1. Optionally, set the input source using the setInputSource(java.lang.String inFile) method of the Filter object. 2. If the input source was set in step 1, call the getSummaryInfo() method of the Filter object to retrieve an object of the SummaryInfo class. Otherwise, call the getSummaryInfo (java.lang.String inFile) method. 3. Use the methods of the SummaryInfo object to retrieve the metadata information. Extract Metadata for Stream Filtering To extract metadata for stream filtering 1. Optionally, set the input source using one of the following methods of the Filter object: l setInputSource(com.verity.api.SeekableInputStream input) l setInputSource(java.io.InputStream input, long size) l setInputSource(java.io.InputStream input) 2. If you set the input source in step 1, call the getSummaryInfo() method of the Filter object to retrieve an object of the SummaryInfo class. Otherwise, call one of the following methods: l getSummaryInfo(com.verity.api.SeekableInputStream input) l getSummaryInfo(java.io.InputStream input, long size) IDOL KeyView (12.13) Page 59 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API l getSummaryInfo(java.io.InputStream input) 3. Use the methods of the SummaryInfo object to retrieve the metadata information. TIP: Micro Focus recommends that you provide a SeekableInputStream. See Input/Output Operations, on page 23. Example Below is an example of a call to getSummaryInfo(): SummaryInfo[] sinfo = objFilter.getSummaryInfo(); if(sinfo != null) { System.out.println("\nSummary info has been extracted."); fos_sum = new FileOutputStream(summaryOutFile); DataOutputStream dos_sum = new DataOutputStream(fos_sum); for(int i=0; i<sinfo.length; i++) { if(sinfo[i].getElementName() != null) { dos_sum.writeBytes("Element name: " + sinfo[i].getElementName() + "\n"); dos_sum.writeBytes("Element type: " + sinfo[i].getSumInfoType() + "\n"); if(sinfo[i].getIsValid() == true) { if(sinfo[i].isDateTimeType()) { dos_sum.writeBytes("Date/time: "); dos_sum.writeBytes(sinfo[i].getDateTime()); } else { byte[] data = sinfo[i].getData(); if(data != null) { dos_sum.writeBytes("Element data: "); dos_sum.write(data); } } } dos_sum.writeBytes("\n\n"); } } dos_sum.close(); fos_sum.close; } sinfo=null; IDOL KeyView (12.13) Page 60 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API The SummaryInfo class stores the metadata extraction results. After calling the Filter.getSummaryInfo()method, call the get methods provided by each instance of this class to extract metadata: getElementName () getSumInfoType () getIsValid() isDateTimeType () getDateTime() getData() Gets the name of the metadata element. Specifies the data type of the metadata element. The possible types are: l KV_String--The value in the metadata field is a string. l KV_Int4--The value in the metadata field is an integer. l KV_DateTime--The value in the metadata field is a date and time. This type is a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (Windows FILETIME EPOCH). You might need to convert this value into another format. You can also use the isDateTimeType() method to determine whether a metadata element is of date/time type, and then use the getDateTime() method to obtain the date/time in the form of a string. l KV_ClipBoard--Currently not supported. l KV_Bool--The value in the metadata field is a boolean. l KV_Unicode--The value in the metadata field is a Unicode string. l KV_IEEE8--The value in the metadata field is an IEEE 8-byte integer. l KV_Other--The value in the metadata field is user-defined. Specifies whether the data value is present in the document. true specifies that the value is valid. For example, if the "Title" element was not populated in the document, getIsValid would return false. Determines whether the metadata element is of date/time type. Gets the date and time in the form of a string. If the metadata element is of KV_ DateTime type, call this method to get the date and time in the form of a string, for example "Wed Jun 30 21:49:08 1993" or "135 Minutes". Gets the content of the element. If type is KV_Int4 or KV_Bool, data contains the actual value. Otherwise, data is a pointer to the actual value. KV_DateTime and KV_IEEE8 point to an 8-byte value. KV_String and KV_Unicode point to the beginning of the string that contains the text. KV_Unicode is replaced with KV_String when the UNICODE value has been character mapped to the desired output character set. IDOL KeyView (12.13) Page 61 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Convert Character Sets Filter can convert the character set of a source document to an arbitrary character set specified in the API, or to the character set of the operating system on which the output text is viewed. For this conversion to occur, a source character set must be identified. The source character set can either be determined by the document reader, or can be set in the API. The section Document Readers, on page 177 lists file formats for which character set information can be determined by the document reader. The character sets are defined as constants in the Filter class. Determine the Character Set of the Output Text To determine the output character set of a filtered document, Filter considers the following: l Whether the document reader can determine the character set of the file format. If the document reader cannot determine the character set information for the document type, set the source character set in the API. l Whether the source character set is specified in the API. l Whether the target character set is specified in the API. Guidelines for Character Set Conversion Below are some rules for the determination of character set mapping: l If the source is not determined by the document reader or configured in the API, then the character set of the output text is always unknown, regardless of the target character set configuration. The document cannot be converted to a target character set or the operating system's code page unless the source character set is known. l If the target character set is not specified in the API, and the source character set is identified, then the operating system's code page is used for the output text. l If the source character set is identified, and the target character set is specified in the API, then the target character set specified in the API is used for the output text. l For documents that contain multiple character sets, Micro Focus recommends that the target character set be forced to UNICODE or UTF-8. The following table illustrates how Filter determines the character set of the output text. Determining the Output Character Set--Example Source charset read by Filter Source charset specified in API No No No KVCS_936 Target charset specified Output in API charset No no conversion No OS code page IDOL KeyView (12.13) Page 62 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Determining the Output Character Set--Example, continued Source charset read by Filter Source charset specified in API Target charset specified Output in API charset No No UNICODE no conversion No KVCS_936 UNICODE UNICODE Yes No No OS code page Yes KVCS_936 No OS code page Yes No UNICODE UNICODE Yes KVCS_936 UNICODE UNICODE Set the Character Set During Filtering You can convert the character set of a file at the time the file is filtered. To specify the source character set, use the setSourceCharSet(java.lang.String charset) method. For example: objFilter.setSourceCharSet(sourceCharSet); To specify the target character set, instantiate the Filter object using the constructor Filter (java.lang.String outputCharSet, long filterFlags). For example: objFilter = new Filter(outputCharSet, filterFlags); Set the Character Set During Subfile Extraction You can convert the character set of a subfile at the time the subfile is extracted from the container and before it is filtered. This is most often used to set the character set of a mail message's body text. See Filter PDF Files, on the next page for more information. To specify the source and target character set of a subfile 1. Use the methods of the ExtSubFileExtractConfig object to set the source and target character set. 2. Call the extExtractSubFile method of the Filter object and pass in the ExtSubFileExtractConfig object. For example: extconfig = new ExtSubFileExtractConfig(); extconfig.setSourceCharset(m_sourceCharSet); IDOL KeyView (12.13) Page 63 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API extconfig.setTargetCharset(m_outputCharSet); extinfo = m_objFilter.extExtractSubFile(extContextID, i, extconfig); Prevent the Default Conversion of a Character Set You can prevent the default conversion of text to the operating system code page, and specify that Filter retain the original character encoding of the document when it is available. Any document identified as containing more than one character encoding is converted to the first encoding encountered in the file. To prevent the default conversion, instantiate the Filter object using the constructor Filter (java.lang.String outputCharSet, long filterFlags), and set the filterFlags argument to FILTERFLAG_NODEFAULTCHARSETCONVERT. For example: objFilter = new Filter(outputCharSet, Filter.FILTERFLAG_NODEFAULTCHARSETCONVERT); This setting overrides the source or target character set specified in the API. Extract Tracked Deleted Text The revision tracking feature in applications--such as Microsoft Word's Track Changes--marks changes to a document (typically, strikethrough for deleted text and underline for inserted text) and tracks each change by reviewer name and date. If revision tracking was enabled when text was deleted from a source document, you can configure Filter to extract the deleted text. Filter does not extract the reviewer name and revision date. Deleted text is excluded from the filtered output by default. To extract deleted text from a document and include it in the filtered output, call the includeRevisionMark method. For example: if(inclRevisionMark == true) { objFilter.includeRevisionMark(); } To reset the flag and exclude deleted text from the filtered output, call the excludeRevisionMark method. For example: if(inclRevisionMark == false) { objFilter.excludeRevisionMark(); } Filter PDF Files Filter has special configuration options that allow greater control over the conversion of Adobe Acrobat PDF files. IDOL KeyView (12.13) Page 64 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Use the pdf2sr Reader The pdf2sr reader is an alternative that can be used instead of pdfsr for filtering PDF files. It uses a different parsing technology and may yield better results for some files. The pdf2sr reader has the following features: l supports standard and custom metadata (non-XMP) l supports basic text extraction l supports password protected PDFs l supports table detection (see Table Detection for PDF Files, on page 71) The pdf2sr reader has the following limitations: l does not support logical order l does not support bidi PDFs l does not extract subfiles l does not extract bookmarks from PDFs l does not give estimations on percent embedded fonts match with display glyphs l does not support XMP metadata l does not support headers or footers l supports annotations only in the raster output, not as searchable text l does not support content access stream l does not support tagged content (PDFs) l does not filter text from XFA-based PDF forms l does not report document restrictions (see Document Restrictions, on page 89) l cannot reconstruct missing information from Arabic text in converted PDFs (when you use Microsoft Print to PDF to convert Word documents that contain Arabic text in Calibri font to PDF, the resulting file is often incomplete because information that is required to interpret the text content is missing. The pdfsr reader can reconstruct the missing information, but pdf2sr does not do this). To use the pdf2sr reader 1. Open the formats.ini file with a text editor. 2. In the [Formats] section, set the following: 200=pdf2 Filter PDF Files to a Logical Reading Order The PDF format is primarily designed for presentation and printing of brochures, magazines, forms, reports, and other materials with complex visual designs. Most PDF files do not contain the logical IDOL KeyView (12.13) Page 65 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API structure of the original document--the correct reading order, for example, and the presence and meaning of significant elements such as headers, footers, columns, tables, and so on. KeyView can filter a PDF file either by using the file's internal unstructured paragraph flow, or by applying a structure to the paragraphs to reproduce the logical reading order of the visual page. Logical reading order enables KeyView to output PDF files that contain languages that read from right-to-left (such as Hebrew and Arabic) in the correct reading direction. NOTE: The algorithm used to reproduce the reading order of a PDF page is based on common page layouts. The paragraph flow generated for PDFs with unique or complex page designs might not emulate the original reading order exactly. For example, page design elements such as drop caps, callouts that cross column boundaries, and significant changes in font size might disrupt the logical flow of the output text. By default, KeyView produces an unstructured text stream for PDF files. This means that PDF paragraphs are extracted in the order in which they are stored in the file, not the order in which they appear on the visual page. For example, a three-column article could be output with the headers and title at the end of the output file, and the second column extracted before the first column. Although this output does not represent a logical reading order, it accurately reflects the internal structure of the PDF. You can configure KeyView to produce a structured text stream that flows in a specified direction. This means that PDF paragraphs are extracted in the order (logical reading order) and direction (leftto-right or right-to-left) in which they appear on the page. The following paragraph direction options are available: Paragraph Direction Option Left-to-right Right-to-left Dynamic Description Paragraphs flow logically and read from left to right. You should specify this option when most of your documents are in a language that uses a left-to-right reading order, such as English or German. Paragraphs flow logically and read from right to left. You should specify this option when most of your documents are in a language that uses a right-to-left reading order, such as Hebrew or Arabic. Paragraphs flow logically. The PDF filter determines the paragraph direction for each PDF page, and then sets the direction accordingly. Filter uses this option when a paragraph direction is not specified. NOTE: Filtering might be slower when logical reading order is enabled. For optimal speed, use an unstructured paragraph flow. The paragraph direction options control the direction of paragraphs on a page; they do not control the text direction in a paragraph. For example, a PDF file might contain English paragraphs in three columns that read from left to right, but 80% of the second paragraph might contain Hebrew characters. If the left-to-right logical reading order is enabled, the paragraphs are ordered logically in the output--title paragraph, then paragraph 1, 2, 3, and so on--and flow from the top left of the first IDOL KeyView (12.13) Page 66 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API column to the bottom right of the third column. However, the text direction of the second paragraph is determined independently of the page by the PDF filter, and is output from right to left. NOTE: Extraction of metadata is not affected by the paragraph direction setting. The characters and words in metadata fields are extracted in the correct reading direction regardless of whether logical reading order is enabled. Enable Logical Reading Order You can enable logical reading order by using either the API or the formats.ini file. Setting the paragraph direction in the API overrides the setting in the formats.ini file. Use the Java API To enable PDF logical reading order in the API, use the setPDFLogicalOrder(int orderFlag) method, and set the orderFlag argument to one of the following flags: Flag Description PDF_ LOGICAL_ ORDER_LTR Logical reading order and left-to-right paragraph direction PDF_ LOGICAL_ ORDER_RTL Logical reading order and right-to-left paragraph direction PDF_ LOGICAL_ ORDER_ AUTO Logical reading order. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. Filter uses this option when a paragraph direction is not specified. PDF_ LOGICAL_ ORDER_RAW Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag. For example: objFilter.setPDFLogicalOrder(Filter.PDF_LOGICAL_ORDER_RTL); The FilterTest sample program demonstrates this method. See FilterTest, on page 100. Use the formats.ini File The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. IDOL KeyView (12.13) Page 67 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API To enable logical reading order by using the formats.ini file 1. Change the PDF reader entry in the [Formats] section of the formats.ini file as follows: [Formats] 200=lpdf 2. Optionally, add the following section to the end of the formats.ini file: [pdf_flags] pdf_direction=paragraph_direction where paragraph_direction is one of the following: Flag LPDF_ LTR LPDF_ RTL LPDF_ AUTO LPDF_ RAW Description Left-to-right paragraph direction Right-to-left paragraph direction The PDF filter determines the paragraph direction for each PDF page, and then sets the direction accordingly. Filter uses this option when a paragraph direction is not specified. Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag. Rotated Text When a PDF that contains rotated text is filtered, the rotated text is extracted after the text at the end of the PDF page on which the rotated text appears. If the PDF is filtered with logical order enabled, and the amount of rotated text on a page surpasses a predefined threshold, the page is automatically output as an unstructured text stream. You cannot configure this threshold. Extract Custom Metadata from PDF Files To extract custom metadata from your PDF files, add the custom metadata names to the pdfsr.ini file provided, and copy the modified file to the bin directory. You can then extract metadata as you normally would. The pdfsr.ini is in the directory samples\pdfini, and has the following structure: <META> <TOTAL>total_item_number</TOTAL>, /metadata_tag_name datatype, </META> IDOL KeyView (12.13) Page 68 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Parameter total_item_number metadata_tag_name datatype Description The total number of metadata tags that are listed. The metadata tag name used in the PDF files. The data type of the metadata element. The possible types are: l KV_String l KV_Int4 l KV_DateTime l KV_ClipBoard l KV_Bool l KV_Unicode l KV_IEEE8 l KV_Other For example: <META> <TOTAL>4</TOTAL> /part_number /volume /purchase_date /customer </META> INT4 INT4 DATETIME STRING Skip Embedded Fonts Text in PDF files sometimes contain embedded fonts. If you experience difficulties filtering embedded fonts, there are options in the API, the formats.ini file, and the FilterTest sample program that you can set to skip this type of text. NOTE: If you choose to skip embedded fonts, none of the content that contains embedded fonts is included in the output. Use the formats.ini File To skip embedded fonts using the formats.ini file l Set the following parameters: [pdf_flags] skipembeddedfont=TRUE embedded_font_threshold=threshold IDOL KeyView (12.13) Page 69 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API where threshold is a value between 0 and 100. A threshold of 100 skips all embedded font text; a threshold of 0 retains all embedded font text. Set skipembeddedfont to TRUE to enable the embedded_font_threshold parameter. The default value of embedded_font_threshold is 100. if you set skipembeddedfont to true and do not specify the embedded_font_threshold parameter, Filter skips all embedded text. When you use formats.ini to skip embedded fonts, you can also specify an embedded font threshold, which is an arbitrary percentage probability that the glyph in the embedded text maps to a character value in the output character set (ASCII, UTF-8, and so on). For example, if you specify a threshold of 75, embedded text glyphs that have a 75% or greater probability of correctly matching the character in the output character set are included in the output; glyphs that have a probability of less than 75% of matching the output character set are omitted from the output. Use the Java API To skip embedded fonts using the Java API, set the setSkipEmbeddedFont(boolean) method to true. For example: objFilter.setSkipEmbeddedFont(true); The FilterTest sample program demonstrates this method. See FilterTest, on page 100. Control Hyphenation There are two types of hyphens in a PDF document: l A soft hyphen is added to a word by a word processor to divide the word across two lines. This is a discretionary hyphen and is used to ensure proper text flow in justified text. l A hard hyphen is intentionally added to a word regardless of the word's position in the text flow. It is required by the rules of grammar or word usage. For example, compound words (such as three-week vacation and self-confident) contain hard hyphens. By default, KeyView skips the source document's soft hyphens in the Filter output to provide more searchable text content. However, if you want to maintain the document layout, you can keep soft hyphens in the Filter output. To keep soft hyphens, you must enable the soft hyphen flag in formats.ini or in the API. Use the formats.ini File To keep soft hyphens by using the formats.ini file, set the following parameter: [pdf_flags] keepsofthyphen=TRUE Use the Java API To keep soft hyphens using the Java API, set the setKeepSoftHyphen(boolean) method to true. For example: IDOL KeyView (12.13) Page 70 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API objFilter.setKeepSoftHyphen(true); The FilterTest sample program demonstrates this method. See FilterTest, on page 100. Filter Portfolio PDF Files Portfolio PDF files contain subfiles and an ActionScript interface for navigating between them. You can use the extraction API to extract the subfiles. See Extract Subfiles from PDF Files, on page 51. Table Detection for PDF Files PDF files often contain data presented in a tabular form. However, there is no information about the table stored within the PDF itself the text is simply placed in an arrangement that looks like a table to the human eye. When this data is filtered, it can be very difficult to reconstruct the table. If table detection is enabled, KeyView attempts to recognize tables within PDF pages, and to reconstruct them before they are output. For each page of the document, KeyView outputs the contents of each table first, and then outputs all remaining text on the page. Micro Focus recommends that tab delimited output is also enabled when using table detection. This means that any tables detected appear in the output text in tab delimited format. To enable table detection and tab delimited output l In the Java API, call the setTableDetection and setTabDelimited methods on the filter object, for example: filter.setTableDetection(true); filter.setTabDelimited(true); l In formats.ini, set the following parameters. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] TableDetection=TRUE TabDelimited=TRUE NOTE: Table detection is only available with the pdf2sr reader. To enable this reader, set the following configuration parameter in formats.ini: [Formats] 200=pdf2 Filter Spreadsheet Files Filter has special configuration options that allow greater control over the conversion of spreadsheet files. IDOL KeyView (12.13) Page 71 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Filter Worksheet Names Normally, Filter does not extract worksheet names from a spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and extract worksheet names by adding the following lines to the formats.ini file: [Options] getsheetnames=1 Filter Hidden Text in Microsoft Excel Files Normally, Filter does not filter hidden text from a Microsoft Excel spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and extract text from hidden rows, columns, and sheets from Excel spreadsheets by adding the following lines to the formats.ini file: [Options] gethiddeninfo=1 Specify Date and Time Format on UNIX Systems In Microsoft Excel you can choose to format dates and times according to the system locale. On Windows, KeyView uses the system locale settings to determine how these dates and times should be formatted. In other operating systems, KeyView uses the U.S. short date format (mm/dd/yyyy). You can change this by specifying the formats you wish to use in the formats.ini file. To specify the system date and time format on UNIX systems l In the formats.ini file, specify the following options: o SysDateTime. The format to use when a cell is formatted using the system format including both the date and the time. o SysLongDate. The format to use when a cell is formatted using the system long date format. o SysShortDate. The format to use when a cell is formatted using the system short date format. o SysTime. The format to use when a cell is formatted using the system time format. NOTE: These values cannot contain spaces. For example, if you specify SysDateTime=%d/%m/%Y, dates and times are extracted in the following format: 28/02/2008 The format arguments are the same as those for the strftime() function. Refer to the following webpage for more information. http://linux.die.net/man/3/strftime IDOL KeyView (12.13) Page 72 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers Numbers in Microsoft Excel files can now be extracted and written to the output without formatting. By default, numbers are extracted in the format specified by the Excel file (for example, General, Currency and Date). Spreadsheets might contain cells that have very large numbers in them. Excel displays the numbers in a scientific notation that rounds or truncates the numbers. To extract numbers without formatting, add the following options in the formats.ini file: [Options] ignoredefnumformats=1 Extract Microsoft Excel Formulas When you filter a Microsoft Excel spreadsheet, KeyView extracts the value of each cell. The value of a cell might be calculated from a formula, but the formula is not included in the output unless you configure KeyView to include it. You can extract the cell value, the formula, or both. For example, if you choose to extract both the cell value and the formula, the output might look like this: 245 = SUM(B21:B26) In this example, the calculated value from the cell is 245 and the formula from which the value is derived is SUM(B21:B26). NOTE: Depending on the complexity of the formulas, enabling formula extraction might result in slightly slower performance. To extract formulas l In the Java API, call the setShowFormulas method on the filter object, for example: filter.setShowFormulas(Filter.ShowFormulas.VALUES_AND_FORMULAS); l You can extract formulas by adding the following parameter to formats.ini: [Options] getformulastring=option where option is one of the following: Option 0 1 2 Description Extract the cell value only. This is the default. Extract the formula only. Extract the formula and the cell value. If a function in a formula is invalid, and option 1 or 2 is specified, only the calculated value is extracted. IDOL KeyView (12.13) Page 73 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Standardize Cell Formats In Microsoft Excel you can format cell values. For example, the date "15/09/2021" could be formatted as "15 September 2021" or "2021-09-15". By default, KeyView extracts cell values with formatting, as they would appear in Excel. If you prefer, you can configure KeyView to standardize cell values. To standardize cell formats l In the Java API, call the setStandardizeCellFormats method on the filter object, for example: filter.setStandardizeCellFormats(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] StandardizeCellFormats=TRUE When this feature is enabled, KeyView formats any cell where a number has been entered according to the following rules. Numbers Numbers are printed to the maximum length enteredthat is, the full number put into the cell, without any rounding. Negative numbers are printed with a dash in front of them (as opposed to, for example, bracket form). The following table provides some examples. Example Formatted value KeyView (standardized) output Rounded number 600 600.1 Scientific notation 1.56E+04 15600 Fraction 17/20 0.85 Percentage 46% 0.46 Text All text that is part of the format string is stripped, including currency symbols. Dates All dates are printed in full ISO-8601 format (that is YYYY-MM-DDTHH:MM:SS). There are two exceptions to this rule: l Cases where the date format contains a time delta (that is, "[h]", "[m]", or "[s]"). In this case, the time is displayed as an interval, which is the number of days (where a day is defined as a period of 24 hours). The time is printed in the ISO-8601 time interval form, for example P1.234D. IDOL KeyView (12.13) Page 74 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API l Cases where the absolute value of the cell is less than 1.0, and the date format contains only time components. In Excel, values between 0.0 and 1.0 correspond to the fictional date 190001-00, and are used to express times without an associated date. For example: Value Date format KeyView output 0.5 hh:mm:ss 0.5 dd hh 1.5 hh:mm:ss 1.5 dd hh 12:00:00 1900-01-00 12:00:00 1900-01-01 12:00:00 1900-01-01 12:00:00 Tab Delimited Empty Cells By default, when filtering spreadsheet files, KeyView skips over empty cells. This behavior removes unnecessary tab characters from the output, but it also loses the table structure. If you require the table structure, you can configure KeyView to ensure that tabs exist between empty cells. To enable tab delimiters around empty cells l In formats.ini, set the TabDelimited parameter to TRUE. For example: [Options] TabDelimited=TRUE Filter Presentation Files to a Logical Reading Order With some file formats, for example Microsoft PowerPoint presentations, the order of the text inside the file has no relation to the layout of the text on the page or screen. Recently modified text might appear at the end of a file, even though that text belongs at the beginning of the document. You can configure KeyView to process position information and sort the extracted text so that it is returned in the correct (reading) order. NOTE: This feature supports Microsoft PowerPoint files only. To enable logical reading order l In the Java API, call the method setFilterLogicalOrder on the Filter object. l In the formats.ini file, find the [Options] section, and set LogicalOrder to 1. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). For example: IDOL KeyView (12.13) Page 75 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API [Options] LogicalOrder=1 Related Topics l Filter PDF Files to a Logical Reading Order, on page 65 Filter HTML Files KeyView can filter comments from HTML documents. To enable comment filtering, you must set a flag in the formats.ini file. The formats.ini file is in the install\OS\bin directory, where install is the Filter installation directory and OS is the name of the operating system. To enable filtering of comments from HTML files 1. Open the formats.ini file in a text editor. 2. Under [Options], set the following flag. GetHTMLHiddenInfo=1 Filter XML Files Filter SDK enables you to extract all or selected content from source XML files. You can specify the elements and attributes extracted from a document using the API or an INI file (see Configure Element Extraction for XML Documents, below). Filter detects the following XML formats: l generic XML l Microsoft Office 2003 XML (Word, Excel, and Visio) l StarOffice/OpenOffice XML (text document, presentation, and spreadsheet) See File Format Detection, on page 238 for more information on format detection. Configure Element Extraction for XML Documents When filtering XML files, you can specify which elements and attributes are extracted according to the file's format ID or root element. This is useful when you want to extract only relevant text elements, such as abstracts from reports, or a list of authors from an anthology. A root element is an element in which all other elements are contained. In the XML sample below, book is the root element: <book> <title>XML Introduction</title> IDOL KeyView (12.13) Page 76 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API <product id="33-657" status="draft">XML Tutorial</product> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book> For example, you could specify that when filtering files with the root element book, the element title is extracted as metadata, and only product elements with a status attribute value of draft are extracted. When you extract an element, the child elements within the element are also extracted. For example, if you extract the element chapter from the sample above, the child element para is also extracted. Filter SDK defines default element extraction settings for the following XML formats: l generic XML l Microsoft Office 2003 XML (Word, Excel, and Visio) l StarOffice/OpenOffice XML (text document, presentation, and spreadsheet) These settings are defined internally and are used when filtering these file formats; however, you can modify their values. In addition to the default extraction settings, you can also add custom settings for your own XML document types. If you do not define custom settings for your own XML document types, the settings for the generic XML are used. Modify Element Extraction Settings You can modify configuration settings for XML documents through either the API or the kvxconfig.ini file. Use the Java API You can use the Java API to modify the settings for the standard XML document types or add configuration settings for your own XML document types. To modify settings 1. Declare an array of XMLConfigSet objects. 2. Create an instance of ConfigOption with the following arguments: a. Set the OptionType to CFG_SETXMLCONFIGINFO. b. Set the OptionValue to 0. c. Set OptionData to the array object. 3. Call the setConfigOption method, and pass in the ConfigOption instance. 4. Call a filter method. For example: IDOL KeyView (12.13) Page 77 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API XMLConfigSet[] XMLInfo; ConfigOption config=new ConfigOption(Filter.CFG_SETXMLCONFIGINFO, 0, XMLInfo); objFilter.setConfigOption(config); Use an Initialization File You can use the initialization file to modify the settings for the standard XML document types or add configuration settings for your own XML document types. To modify settings 1. Modify the kvxconfig.ini file. 2. Use the initialization file when processing the XML file. See Modify Element Extraction Settings in the kvxconfig.ini File, below. The Java sample program FilterTest demonstrates how to use the initialization file in the filtering process. See Sample Programs, on page 91. Modify Element Extraction Settings in the kvxconfig.ini File The kvxconfig.ini file contains default element extraction settings for supported XML formats. The file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. For example, the following entry defines extraction settings for the Microsoft Visio 2003 XML format: [config3] eKVFormat=MS_Visio_XML_Fmt szRoot= szInMetaElement=DocumentProperties szExMetaElement=PreviewPicture szInContentElement=Text szExContentElement= szInAttribute= The following options are available: Configuration Option Description eKVFormat The format ID as detected by the KeyView detection module. This determines the file type to which these extraction settings apply. See File Format Detection, on page 238 for more information on format ID values. If you are adding configuration settings for a custom XML document type, this is not defined. szRoot The file's root element. When the format ID is not defined, the root element is used to determine the file type to which these settings apply. To further qualify the element, specify its namespace. See Specify an Element's Namespace and Attribute, on the next page. szInMetaElement The elements extracted from the file as metadata. All other elements are IDOL KeyView (12.13) Page 78 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Configuration Option Description extracted as text. Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, below. szExMetaElement The child elements in the included metadata elements that are not extracted from the file as metadata. For example, the default extraction settings for the Visio XML format extract the DocumentProperties element as metadata. This element includes child elements such as Title, Subject, Author, Description, and so on. However, the child element PreviewPicture is defined in szExMetaElement because it is binary data and should not be extracted. You cannot exclude any metadata elements from the output for StarOffice files. All metadata is extracted regardless of this setting. Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, below. szInContentElement The elements extracted from the file as content text. Enter an asterisk (*) to extract all elements including child elements. Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, below. szExContentElement The child elements in the included content elements that are not extracted from the file as content text. Separate multiple entries with commas. To further qualify the element, specify its namespace, its attributes, or both. See Specify an Element's Namespace and Attribute, below. szInAttribute The attribute values extracted from the file. If attributes are not defined here, attribute values are not extracted. Enter the namespace (if used), element name, and attribute name in the following format: namespace:elementname@attributename For example: microfocus:division@name Separate multiple entries with commas. Specify an Element's Namespace and Attribute To further qualify an element, you can specify that the element exist in a certain namespace and/or contain a specific attribute. To define the namespace and attribute of an element, enter the following: IDOL KeyView (12.13) Page 79 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API ns_prefix:elemname@attribname=attribvalue NOTE: You must enclose attribute values that contain spaces in quotation marks. For example, the entry bg:language@id=xml extracts a language element in the namespace bg that contains the attribute name id with the value of "xml". This entry extracts the following element from an XML file: <bg:language id="xml">XML is a simple, flexible text format derived from SGML</bg:language> but does not extract: <bg:language id="sgml">SGML is a system for defining markup languages.</bg:language> or <adv:language id="xml">The namespace should be a Uniform Resource Identifier (URI).</adv:language> Add Configuration Settings for Custom XML Document Types You can define element extraction settings for custom XML document types by adding the settings to the kvxconfig.ini file. For example, for files that contain the root element microfocusxml, you can add the following section to the end of the initialization file: [config101] eKVFormat= szRoot=microfocusxml szInMetaElement=dc:title,dc:meta@title,dc:meta@name=title szExMetaElement= szInContentElement=microfocus:division@name=keyview,microfocus:division@name=idol,p@ style="Heading 1" szExContentElement= szInAttribute=microfocus:division@name The custom extraction settings must be preceded by a section heading named [configN], where N is an integer starting at 100 and increasing by 1 for each additional file type, as in [config100], [config101], [config102], and so on. The default extraction settings for the supported XML formats are numbered config0 to config99. Currently only 0 to 6 are used. Since a custom XML document type is not recognized by the KeyView detection module, the format ID is not defined. The file type is identified by the file's root element only. If a custom XML document type is not defined in the kvxconfig.ini file or by the setConfigOption method, then the default extraction settings for a generic XML document are used. IDOL KeyView (12.13) Page 80 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Configure Headers and Footers You can configure custom header and footer tags for word processing and spreadsheet documents by editing the formats.ini file. To configure headers and footers 1. Open the formats.ini file. 2. In the [Options] section, add the following items: header_start_tag=HeaderStart header_end_tag=HeaderEnd footer_start_tag=FooterStart footer_end_tag=FooterEnd For example: header_start_tag=<myHeaderTag> header_end_tag=</myHeaderTag> footer_start_tag=<myFooterTag> footer_end_tag=</myFooterTag> NOTE: You must encode custom tags in UTF-8. Error Messages When a KeyView exception is thrown, it might be caused by one of the following errors. Exception KVERR_Success KVERR_DLLNotFound KVERR_OutOfCore KVERR_processCancelled KVERR_badInputStream KVERR_badOutputType KVERR_General KVERR_FormatNotSupported KVERR_PasswordProtected Description Function completed successfully. A DLL or shared library was not found. Memory allocation failure. Callback function returns FALSE. Invalid or corrupt input stream. Invalid output is requested. General error. File format is not supported. File is encrypted or password-protected. KeyView only supports secure PST, NSF, and ZIP files. IDOL KeyView (12.13) Page 81 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Exception Description KVERR_ADSNotFound Adobe Document Server not found. This error is obsolete. KVERR_AutoDetFail Autodetect error. KVERR_AutoDetNoFormat Unable to detect file format. KVERR_ReaderInitError Error initializing the reader. KVERR_NoReader No reader available for this format. KVERR_ CreateOutputFileFailed Unable to create output file. If the overwrite flag in setOverWrite is FALSE and a subfile has the same name as a file in the target path, this error is generated. KVERR_CreateTempFileFailed Unable to create temporary file. KVERR_ ErrorWritingToOutputFile Error writing to output file. KVERR_CreateProcessFailed Error creating a child process. KVERR_WaitForChildFailed Wait for child process failed. KVERR_ChildTimeOut Child process hung/timed out. KVERR_ArchiveFileNotFound Attempt to extract nonexistent file. KVERR_ArchiveFatalError Fatal error processing an archive file. KVError_OpenStreamFailure = Failed to open a stream during out-of-process filtering. KVERR_ArchiveFatalError +1 KVError_ InterfaceFunctionNotFound An interface function was not found during out-of-process filtering. KVError_InputFileNotFound Could not find the input file during out-of-process filtering. KVError_ OpenOutputFileFailed Could not open the output file during out-of-process filtering. KVError_MemoryLeak Memory leak occurred during out-of-process filtering. KVError_MemoryOverwrite Memory overwrite occurred during out-of-process filtering. KVError_GPF Exception occurred during out-of-process filtering. KVError_OopCore Memory dump was generated in a child process during out-ofprocess filtering. KVError_KVoopLogFailed Creation of out-of-process error log failed. KVError_OverNestedFileLimit The container file has more than the allowable number of child IDOL KeyView (12.13) Page 82 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Exception KVError_PSTAccessFailed KVError_PasswordRequired KVError_InvalidArgs KVError_OutputFileExists KVError_ReaderUsageDenied KVError_OopBadConfig KVError_OopBrokenPipe Description documents. One or more child documents were not converted. Currently, this is not used. The PST file could not be converted. This error might be returned when a call to extOpenDocument returns NULL for one of the following reasons: l Microsoft Outlook client is not installed l Microsoft Outlook client is installed, but is not the default email client l Microsoft Outlook client is installed, but is not configured correctly l PST file is corrupt l PST file is read-only (PST files must allow read and write access) l MAPI call fails l The bit editions of Microsoft Outlook do not match the bit editions of the KeyView software. For example, if 32-bit KeyView is used, 32-bit Outlook must be installed. If 64-bit KeyView is used, 64-bit Outlook must be installed. To open the file, credentials must be provided. This error might be returned when a call to extOpenDocument returns NULL. The input argument or structure is invalid. This is generated by the File Extraction APIs. A file with the same name already exists in the output directory. This error is generated when extracting a subfile from a container file with the setOverWrite flag set to FALSE, and a file by the same name already exists in the output directory. The current license key does not enable the document reader required to filter the file. This error might be returned when a call to extOpenDocument returns NULL. Some document readers are considered advanced features and are licensed separately from the KeyView SDK (for example, the PST and MBX readers). Contact your Micro Focus sales representative to get an updated license key Information in the kvxconfig.ini file is incomplete and cannot be used to filter the XML file. Data was not transferred between the parent and child IDOL KeyView (12.13) Page 83 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Exception KVError_OopPipeOEF KVError_IPCTimeOut KVError_ InvalidOopDriverSignature KVError_ InvalidOopServiceSignature Description processes during out-of-process filtering because either the parent or child failed. Data was not transferred between the parent and child processes during out-of-process filtering because the parent process was shutdown. Either the parent or child process is waiting for a reply or request during out-of-process filtering. A client sent a request to the File Extraction out-of-process server, but context driver does not exist on the server. A client sent a request to a File Extraction out-of-process server that does not exist. If this error is generated on the call to fpClose(), it can be ignored. Tab Delimited Output for Spreadsheets and Embedded Tables You can use KeyView to convert spreadsheets, embedded tables in Word Processing documents (for example, Microsoft Word documents), and tables detected by Optical Character Recognition (OCR), to tab-delimited form. In this format, KeyView inserts a tab character between each cell, and a line break between each row. Tab and line break characters in the cells are replaced with spaces. For spreadsheets, this format ensures that tabs exist between empty cells, which can be useful when you need to keep the table structure after filtering. To enable tab delimited output for spreadsheets and embedded tables l In the Java API, call the setTabDelimited method on the filter object, for example: filter.setTabDelimited(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API.) [Options] TabDelimited=TRUE IDOL KeyView (12.13) Page 84 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Table Output for IDOL Eduction For files that contain multiple tables, KeyView includes an option that creates output with delimiters between tables that can be understood by IDOL Eduction. This option allows Eduction to extract entity data from tables. To use this option, you must enable Tab Delimited output, and set the target character set to KVCS_ UTF8. To enable table delimiters for spreadsheets and embedded tables l In formats.ini, set the following parameter. [Options] OutputTableDelimiters=TRUE For more information about table extraction in IDOL Eduction, refer to the IDOL Eduction User and Programming Guide. Exclude Japanese Guide Text This option prevents output of Japanese phonetic guide text when Microsoft Excel (.xlsx) files are processed. To prevent output of Japanese phonetic guide text l In the Java API, call the setNoPhoneticGuides method on the filter object, for example: filter.setNoPhoneticGuides(true); l In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] NoPhoneticGuides=TRUE Source Code Identification When KeyView auto-detects a file that contains source code, it can attempt to identify the programming language that it is written in. When you do not enable source code identification, files containing source code may be identified as ASCII text files, causing the application to treat them in the same way as ordinary text. However, in many instances, it can be useful to route these files elsewhere or filter them out. For example, indexing source code into an IDOL index has minimal value and could bloat the engine with terms that are of no use in retrieval. You can use source code identification to identify files containing a particular programming language as a more specific format. IDOL KeyView (12.13) Page 85 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API NOTE: Source code identification is available only on certain platforms (see source code identification in the platform differences section). You can set source code identification to different levels. Option KVSOURCECODE_OFF KVSOURCECODE_ENABLED KVSOURCECODE_EXTENDED Description Do not enable source code identification. Enable source code identification for the most common source code formats. Enable source code identification for all supported source code formats. This option might lead to false positives in some cases (for example, a C++ file might get identified as a rarer format). For the complete list of source code formats supported for both options, see Supported Formats, on page 107. To configure source code identification l In the Java API, call the setSourceCodeDetection method on the filter object, for example: filter.setSourceCodeDetection(Filter.SourceCodeDetection.ENABLED); l In formats.ini, set the following parameter to the appropriate level. (This is an alternative approach - you do not need to do this if you have configured this feature through the API). [Options] SourceCodeDetection=KVSOURCECODE_ENABLED Optical Character Recognition When processing raster image files, KeyView can perform Optical Character Recognition (OCR) to attempt to filter text that might be visible in the image. If text is detected to form part of a table, it will be filtered in the same way as tables in Word Processing documents. NOTE: KeyView performs OCR only on standalone raster files, not on images embedded inside other documents. For embedded images, you must first extract the images by using the Extract Images option. NOTE: OCR is available only on certain platforms (see Optical Character Recognition in the platform differences section). If your license includes OCR, it is enabled by default. IDOL KeyView (12.13) Page 86 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API To enable or disable OCR l Call the setOcr method of the Filter class. Optimize OCR Performance The default settings for OCR attempt to detect as much text as possible. For example, KeyView attempts to detect text in multiple languages and alphabets, and rotated text in increments of 90 degrees from upright. This increases the amount of text that can be detected, prioritizing recall over processing time. If you know what you will be processing in advance, you can specify OCR options to improve performance. To configure OCR through the Java API, call the method filter.setOcr. For example, if the input is scanned pages that contain only English or only Japanese text, the following configuration could result in a performance improvement. However, it may fail to recognize text in some images such as landscape pages where the text is not upright. filter.setOcr(new OCROptions("en ja", OCROptions.Orientation.UPRIGHT, OCROptions.DetectAlphabet.LISTED)); Text Finding Mode OCR can use different algorithms for finding text. Each algorithm is optimized for a different type of image: l Document - A scanned or printed page of formatted text, such as a report, magazine, or letter. l Scene - An image of a general scene that contains text, such as a photograph or TV footage. l Hollow - A scene image containing outlined text, such as white characters with a black border which are often used in television subtitles. l Auto - The IDOL OCR library selects the algorithm automatically. Languages OCR supports many different languages. For a list of supported languages, see OCR Supported Languages, on page 282. If you know that your files only contain text in a certain language or a small number of languages, you can improve both processing speed and accuracy by configuring OCR with this information. Orientation By default, OCR attempts to detect text that appears rotated, in 90-degree increments from upright. This means that KeyView can filter text from an image, even if it has been rotated or was scanned upside-down. If you know that your images contain only upright text, you can improve processing speed by disabling this feature. IDOL KeyView (12.13) Page 87 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Alphabet Detection Sometimes, if you do not know the language of the input text in advance of processing, you might specify multiple languages. OCR requires more processing time for each additional language, especially when the languages span multiple alphabets (Latin, Cyrillic, Chinese, Arabic, and so on). You can configure OCR to detect the alphabet for each image, before attempting to recognize characters. You can choose one of the following options. l Off. By default, OCR does not detect the alphabet. Use this option when you have specified a single language or multiple languages that use the same alphabet. Micro Focus also recommends this option when you expect an image to use multiple alphabets (for example, when there is English and Arabic text on the same page). l Listed. OCR detects the alphabet, but only considers alphabets that are represented in your chosen list of languages. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. For example, if you set languages="en ja ko" (English, Japanese, and Korean) and OCR detects the Latin alphabet, OCR ignores the Japanese and Korean languages. Micro Focus recommends using this option when each source image uses a single alphabet, and the list of possible languages is known but spans multiple alphabets. l Any. OCR detects the alphabet that is used, and considers all alphabets. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. If none of your chosen languages match the detected alphabet, OCR does not recognize characters and there is no output. Micro Focus recommends using this option instead of Listed when you want to reject images that do not match any of the specified languages. If your input contains Chinese, Japanese, or Korean text with some ASCII characters, you can safely set this parameter to any of the available options, because OCR includes ASCII characters for those languages. Configure the Proxy for RMS When KeyView needs to access contents that are protected by the Microsoft Rights Management System (RMS), it must make HTTP requests. By default, KeyView uses the system proxy settings for these requests. To use different proxy settings, you can configure them in the [RMS] section of the formats.ini configuration file. The following table describes the available options. Parameter UseSystemProxy Description Whether to obtain details about your HTTP proxy from the system. By default, this parameter is set to TRUE, which means: l On Microsoft Windows platforms, KeyView reads the proxy settings that are configured in the Windows Control Panel. l On Linux, KeyView reads the proxy settings from environment variables IDOL KeyView (12.13) Page 88 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API Parameter ProxyHost ProxyPassword ProxyPort ProxyUsername Description such as HTTP_PROXY and HTTPS_PROXY. You can use UseSystemProxy instead of setting the other proxy parameters (ProxyHost, ProxyPort, ProxyUsername, and ProxyPassword). When UseSystemProxy is set to TRUE, you must remove these other parameters from your configuration. NOTE: On Linux platforms, KeyView can retrieve a proxy username and password from an environment variable in the form http://username:password@proxy.example.com:8080/. However, this value cannot be encrypted. On Microsoft Windows platforms, the operating system does not return a proxy username and password, so these are not supported. Set UseSystemProxy to FALSE to use different proxy settings. In this case you must set at least ProxyHost and ProxyPort. The host name or IP address of the proxy server. The password to use to authenticate with the proxy server. The port of the proxy server to use to access the repository. This port must be greater than 0, and less than 65535. The user name to use to authenticate with the proxy server. Document Restrictions Some applications, and corresponding file formats, allow users to restrict the ways in which a document can be used. For example, you might be able to read a document but additional credentials (such as a password) could be required to modify the document content, add comments, or print the document. The restrictions might not be enforced by encryption, but instead rely on any software that accesses the file to respect the restrictions that have been set. TIP: These restrictions are not file system permissions (for example, making a file read-only). They are restrictions applied by the software package that created the file. KeyView can report whether a document is protected by write restrictions, for the following file formats. A write restriction is defined as any restriction, enforced by a password, that prevents a user from editing the document content. l Adobe Portable Document Format (.PDF) l Microsoft Word (.DOCX) l Microsoft Excel (.XLSX) l Microsoft PowerPoint (.PPTX) IDOL KeyView (12.13) Page 89 of 284 Filter SDK Java Programming Guide Chapter 4: Use the Filter API To determine whether a document is protected by restrictions l In the Java API, use the method getRestrictions on the filter object. For example: Restrictions restrictions = filter.getRestrictions("document.docx"); IDOL KeyView (12.13) Page 90 of 284 Chapter 5: Sample Programs This section describes the sample programs provided with Filter SDK. · Introduction 91 · ExtractFilter 92 · FilterFileByChunk 94 · FilterFileToFile 95 · FilterFileToStream 96 · FilterStreamByChunk 97 · FilterStreamToFile 98 · FilterStreamToStream 99 · FilterTest 100 Introduction The following Java sample programs are provided: l ExtractFilter l FilterFileByChunk l FilterFileToFile l FilterFileToStream l FilterStreamByChunk l FilterStreamToFile l FilterStreamToStream l FilterTest The source code for the programs is in the directory javaapi/sample. Included alongside the source code are compiled .class files, and the following Batch (.bat) and C Shell (.csh) files that help run the corresponding program: FilterFileToFile.bat (.csh) FilterStreamToStream.bat (.csh) FilterFileToStream.bat (.csh) FilterStreamToFile.bat (.csh) FilterFileByChunk.bat (.csh) FilterStreamByChunk.bat (.csh) IDOL KeyView (12.13) Page 91 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs The sample programs pass license information to KeyView through the Filter constructor. This is the method recommended by Micro Focus. Before the sample code can be compiled, you must replace the placeholders YOUR_LICENSE_ORGANIZATION and YOUR_LICENSE_KEY with your license information. The compiled .class files that are supplied in the SDK have an embedded trial license, which expires approximately five months after release. If the environment variables KV_SAMPLE_PROGRAM_ LICENSE_ORGANIZATION and KV_SAMPLE_PROGRAM_LICENSE_KEY are set then those values are used instead, so that you can use the programs after the embedded trial license has expired, and test or troubleshoot with your own license. NOTE: The sample programs that demonstrate the use of an input stream show filtering from a java.io.InputStream object. In KeyView version 12.9 and later, the stream methods are overloaded to allow you to pass a com.verity.api.SeekableInputStream implementation into KeyView. Micro Focus recommends this option, as it allows KeyView to seek about in the file, only reading the parts it needs to read. If you do need to use a Java InputStream, and you know the stream length, using the method overload that passes in the size might allow KeyView to avoid caching the whole file. ExtractFilter The ExtractFilter program demonstrates the File Extraction interface. The FilterTest sample program demonstrates the functionality of the Filtering interface. See FilterTest, on page 100. The ExtractFilter program demonstrates the following functionality: l opens a document l extracts subfiles from a document l repeats subfile extraction until all subfiles are extracted l enables you to specify the command-line options listed in the following table To run ExtractFilter 1. Add the location of the javaapi\KeyView.jar file, the javaapi\sample directory, and the Filter bin directory to the CLASSPATH environment variable. 2. Type the following: java -Djava.library.path=bin_directory ExtractFilter [options] bin_directory input_file output_dir where, bin_directory is the path to the Filter bin directory. options is one or more of the options listed in the following table. IDOL KeyView (12.13) Page 92 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs input_file is the path and file name of the source file. output_dir is the path of the folder to write the output files to. This folder does not have to exist. Options for ExtractFilter Sample Program Option Description -extonly Extracts the subfiles from a source file but does not filter the files after extraction. -ext-fbody Extracts the formatted version of the message body (HTML or RTF) from mail files when possible. -source-cs charset Sets the character set of the source file. charset is a character set defined in the Filter class. See Coded Character Sets, on page 219. -target-cs charset Sets the character set of the output file. charset is a character set defined in the Filter class. See Coded Character Sets, on page 219. -little-end Sets the byte order for Unicode text to Little Endian. -is Sets the input as a stream. The default is file. -os Sets the output as a stream. The default is file. -ip Runs file extraction in the same process as the calling application (in process). See Run Filter In Process, on page 27. -open-user username -open-pass password -openidfile idfile -opencreateroot Specifies the user name used to open a protected PST file. Specifies the password used to open a protected PST file. Specifies the user ID file used to open a protected PST file. Creates a root directory on which a hierarchy can be based. See Create a Root Node, on page 37. -ext-nodir Specifies the subfile directory structure is not created. -extnoheader Excludes mail header information from extracted message body text file. See Exclude Metadata from the Extracted Text File, on page 44. -meta outfile Extracts default mail metadata and writes it to a file. See Extract Mail Metadata, on page 38. IDOL KeyView (12.13) Page 93 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs FilterFileByChunk The FilterFileByChunk program filters an input file to an output file using the Java API method doFilterChunk(). The method filters an input source and returns one chunk of output data. The program calls the method repeatedly until the entire file is processed. Run FilterFileByChunk on Windows To run FilterFileByChunk on Windows 1. In the FilterFileByChunk.bat file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following: filterfilebychunk inputfile outputfile where, inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. Run FilterFileByChunk on UNIX To run FilterFileByChunk on UNIX 1. In the FilterFileByChunk.csh file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following: /FilterFileByChunk.csh inputfile outputfile where, inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. IDOL KeyView (12.13) Page 94 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs FilterFileToFile The FilterFileToFile program filters an input file to an output file using Java API methods in Filter. It demonstrates the following functions: l filters an input file to an output file. l extracts the character set if it can be determined by the document reader. l extracts file format information (document type, format, version, and so on) if available in the source document. l extracts metadata if available in the source document. This program extracts all the metadata from the document, but only displays the first element of metadata. Run FilterFileToFile on Windows To run FilterFileToFile on Windows 1. In the FilterFileToFile.bat file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following: filterfiletofile inputfile outputfile where, inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. Run FilterFileToFile on UNIX To run FilterFileToFile on UNIX 1. In the FilterFileToFile.csh file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following: ./FilterFileToFile.csh inputfile outputfile IDOL KeyView (12.13) Page 95 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs where, inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. FilterFileToStream The FilterFileToStream program filters an input file to an output stream using Java API methods in Filter. Run FilterFileToStream on Windows To run FilterFileToStream on Windows 1. In the FilterFileToStream.bat file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following: filterfiletostream inputfile where, l inputfile is the path and file name of the source file. l The generated text is output to the current DOS prompt. Run FilterFileToStream on UNIX To run FilterFileToStream on UNIX 1. In the FilterFileToStream.csh file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following: ./FilterFileToStream.csh inputfile where, IDOL KeyView (12.13) Page 96 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs l inputfile is the path and file name of the source file. l The generated text is output to the current console (standard out). FilterStreamByChunk The FilterStreamByChunk program filters an input stream to an output stream using the Java API method doFilterChunk(). The method filters an input source and returns one chunk of output data. The program calls the method repeatedly until the entire output buffer is processed. NOTE: In KeyView version 12.9 and later, Micro Focus recommends that you implement a com.verity.api.SeekableInputStream. See Input/Output Operations, on page 23. Run FilterStreamByChunk on Windows To run FilterStreamByChunk on Windows 1. In the FilterStreamByChunk.bat file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following: filterstreambychunk inputfile outputfile where, l inputfile is the path and file name of the source file. l outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. Run FilterStreamByChunk on UNIX To run FilterStreamByChunk on UNIX 1. In the FilterStreamByChunk.csh file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following: IDOL KeyView (12.13) Page 97 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs ./FilterStreamByChunk.csh inputfile outputfile where, l inputfile is the path and file name of the source file. l outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. FilterStreamToFile The FilterStreamToFile program filters an input stream to an output file using Java API methods in Filter. NOTE: In KeyView version 12.9 and later, Micro Focus recommends that you implement a com.verity.api.SeekableInputStream. See Input/Output Operations, on page 23. Run FilterStreamToFile on Windows To run FilterStreamToFile on Windows 1. In the FilterStreamToFile.bat file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following: filterstreamtofile inputfile outputfile where, inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. IDOL KeyView (12.13) Page 98 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs Run FilterStreamToFile on UNIX To run FilterStreamToFile on UNIX 1. In the FilterStreamToFile.csh file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following: ./FilterStreamToFile.csh inputfile outputfile where, inputfile is the path and file name of the source file. outputfile is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. FilterStreamToStream The FilterStreamToStream program filters an input stream to an output stream using Java API methods in Filter. It demonstrates the following functions: l creates an input and an output stream. Filters the input stream to the output stream. l extracts file format information (document type, format, version, and so on) if available in the source document. l extracts metadata if available in the source document. This program extracts all the metadata from the document, but only displays the first element of metadata. NOTE: In KeyView version 12.9 and later, Micro Focus recommends that you implement a com.verity.api.SeekableInputStream. See Input/Output Operations, on page 23. IDOL KeyView (12.13) Page 99 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs Run FilterStreamToStream on Windows To run FilterStreamToStream on Windows 1. In the FilterStreamToStream.bat file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the batch file in the directory install\javaapi\sample, where install is the path name of the Filter installation directory. Type the following: filterstreamtostream inputfile where, l inputfile is the path and file name of the source file. l The generated text is output to the current DOS prompt. Run FilterStreamToStream on UNIX To run FilterStreamToStream on UNIX 1. In the FilterStreamToStream.csh file, set the following variables. INSTALL_DIR PLATFORM The absolute path of the KeyView Filter SDK installation directory. The platform name. 2. Run the C shell file in the directory install/javaapi/sample, where install is the path name of the Filter installation directory. Type the following: ./FilterStreamToStream.csh inputfile where, l inputfile is the path and file name of the source file. l The generated text is output to the current console (standard out). FilterTest The FilterTest program demonstrates most of the Filtering methods available in the Java API. It filters an input document to an output document and enables you to specify command-line options. The command-line options are listed in Options for FilterTest Sample Program, on the next page. IDOL KeyView (12.13) Page 100 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs To run FilterTest 1. Add the location of the javaapi\KeyView.jar file, the javaapi\sample directory, and the Filter bin directory to the CLASSPATH environment variable. 2. Type the following command line: java -Djava.library.path=bin_directory FilterTest [options] bin_directory input_file output_file where, l bin_directory is the path to the Filter bin directory. l options is one or more of the options listed in Options for FilterTest Sample Program, below. l input_file is the path and file name of the source file. l output_file is the path and file name of the generated file. If a path is not specified, the file is output to the current directory. Options for FilterTest Sample Program Option Description -is Sets the input as a stream. The default is file. -os Sets the output as a stream. The default is file. -chunk Filters an input source and returns one chunk of output data. The program calls the filter method repeatedly until the entire output buffer is processed. -docformat filename Extracts the file format information and writes it to a file. filename is the name of the file to which the format information is written. -summary filename Extracts the metadata and writes it to a file. filename is the name of the file to which the metadata is written. See Extract Metadata, on page 59. -getTargetCS Extracts the character set used in the output file to the standard output. -c charset Sets the character set of the output file. Use the option -getTargetCS to determine whether the target character set specified is used in the output file. charset is a character set defined in the Filter class. See Coded Character Sets, on page 219. -cs charset Sets the character set of the source file. charset is a character set defined in the Filter class. See Coded Character Sets, on page 219. -rc character Sets a replacement character for characters that cannot be mapped. The IDOL KeyView (12.13) Page 101 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs Options for FilterTest Sample Program, continued Option Description default is a question mark (?). -ip Runs Filter in the same process as the calling application (in process). See Run Filter In Process, on page 27. -ooplog Enables error logging. See Enable or Disable Error Logging, on page 56. Error logs are not generated when in-process filtering is enabled. -oopmem Enables the memory trace system in the error logs. The memory trace system reports memory leaks and memory overwrites in the log file. See Report Memory Errors, on page 57. Error logs are not generated when in-process filtering is enabled. -hf -hftags -lo Extracts headers and footers, as well as the body text. Puts tags around header and footer data. Specifies that PowerPoint PPT97 and PPTX file text data is output in a logical reading order. -lsbmsb Uses LSBMSB byte order for Unicode text. LSBMSB is the "Least Significant Byte Most Significant Byte," or in other words, the byte order for Little Endian systems. -msblsb For Unicode text, uses MSBLSB byte order. MSBLSB is the "Most Significant Byte Least Significant Byte," or in other words, the byte order for Big Endian systems. -bomarker Generates the byte order marker for Unicode text. -nodefcsconv Prevents default conversion of document character encoding. See Prevent the Default Conversion of a Character Set, on page 64. -x xmlconfigfile Filters an XML file using customized extraction settings defined in the kvxconfig.ini file. If you do not enter the full path to the INI file, the program looks for the file in the current working directory. See Filter XML Files, on page 76. -z Specifies a temporary directory where temporary files generated by the tempdirectory filtering process are stored. The default is the current working directory. -ps password Specifies a password to open a password-protected PST file. This uses the Container API which is obsolete. -pdflorder orderFlag Specifies that PDF files are output in a logical reading order. The parameter orderFlag is one of the following: l ltr--left-to-right paragraph direction. l rtl--right-to-left paragraph direction. IDOL KeyView (12.13) Page 102 of 284 Filter SDK Java Programming Guide Chapter 5: Sample Programs Options for FilterTest Sample Program, continued Option Description l auto--The PDF filter determines the paragraph direction (left-to-right or right-to-left) for each PDF page, and then sets the direction accordingly. l raw--Unstructured paragraph flow. See Filter PDF Files, on page 64. -rm If you set this option, text that was deleted from a document with revision tracking enabled is extracted from the document and included in the filtered output. See Extract Tracked Deleted Text, on page 64. -embeddedfont If you set this option, text that contains embedded fonts is not filtered from PDF documents. See Filter PDF Files, on page 64. IDOL KeyView (12.13) Page 103 of 284 Part III: Appendixes This section lists supported formats, supported character sets, and redistributed files, and provides information on format detection and developing a custom document reader. l Supported Formats l Document Readers l Platform Differences l Character Sets l Extract and Format Lotus Notes Subfiles l File Format Detection l List of Required Files for Redistribution l Develop a Custom Reader l Password Protected Files l OCR Supported Languages IDOL KeyView (12.13) Page 104 of 284 Appendix A: Supported Formats This section lists the file formats that KeyView can detect. · Key to Supported Formats Table 105 · Supported Formats 107 · File Classes 175 Key to Supported Formats Table The supported formats table includes the following information: Column Format Name Number Category Description MIME Type Extension Description The format name that is returned by KeyView format detection. l In the C API, these values are defined in the ENdocFmt enumeration in adDocFmt.h. l In the .NET API these values are defined in the Autonomy.API.Filter.DocFormat enumeration. l In the Java API these values are defined in the com.verity.api.DocFormat enumeration. l In the C++ API these values are defined in keyview::Format, used in DetectionInfo which is returned by Session::detect(). The format number that is returned by KeyView format detection. This is the value associated with the Format Name in the relevant enumeration. This value is used in the KeyView configuration file formats.ini to specify the reader to use to filter, export, or view the format. Several formats might have the same category value. A short description of the file format. The MIME type (if any). A list of common file extensions for the file format. NOTE: This is not a complete list of file extensions. KeyView does not distinguish between file types based on their extension. Instead, it detects the file format based on the file content. This is more reliable because content cannot always be predicted from the file extension, and because some file extensions are associated with multiple formats. IDOL KeyView (12.13) Page 105 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats File Class The KeyView file class. l In the C API, these values are defined in the ENdocClass enumeration in adinfo.h. l In the .NET API these values are defined in the Autonomy.API.Filter.DocClass enumeration. l In the Java API these values are defined in the com.verity.api.DocClass enumeration. l In the C++ API these values are defined in keyview::Category, used in DetectionInfo which is returned by Session::detect(). IDOL KeyView (12.13) Page 106 of 284 Supported Formats Format Name Number Reserved__Fmt -1 Unknown_Fmt 0 AES_Multiplus_Comm_ 1 Fmt ASCII_Text_Fmt 2 MSDOS_Batch_File_Fmt 3 Applix_Alis_Fmt 4 BMP_Fmt 5 Category -1 0 1 2 2 3 4 Description Multiplus (AES) Plain Text file MS-DOS Batch File Applix Asterix Windows Bitmap Image (BMP) MIME Type text/plain application/x-bat image/bmp CT_DEF_Fmt 6 5 Corel_Draw_Fmt 7 6 CGM_ClearText_Fmt 8 8 CGM_Binary_Fmt 9 8 CGM_Character_Fmt 10 8 Word_Connection_Fmt 11 9 COMET_TOP_Word_Fmt 12 10 CEOwrite_Fmt 13 11 DSA101_Fmt 14 12 DCA_RFT_Fmt 15 13 CDA_DDIF_Fmt 16 14 DG_CDS_Fmt 17 16 Micrografx_Draw_Fmt 18 18 Data_Point_VistaWord_ 19 19 Fmt DECdx_Fmt 20 20 Enable_WP_Fmt 21 21 Convergent Technologies DEF Comm. Format CorelDRAW (up to version 13/X3) Computer Graphics Metafile (CGM) Computer Graphics Metafile (CGM) Computer Graphics Metafile (CGM) Word Connection Nixdorf COMET TOP Financial Accounting software CEOwrite DSA101 (Honeywell Bull) IBM DCA-RFT (Revisable Form) CDA / DDIF DG Common Data Stream (CDS) Windows Draw (Micrografx) Vistaword application/coreldraw image/cgm application/dca-rft image/x-mgx-dsf DEC WPS Plus DX format Enable Word Processing application/dec-dx application/ewp IDOL KeyView (12.13) Extension PTF TXT BAT AX BMP CDR CGM CGM CGM CN CW RFT, DC DDIF CDS DRW DV DX WPF File Class AutoDetNoFormat AutoDetNoFormat adWORDPROCESSOR Readers adWORDPROCESSOR adEXECUTABLE adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adACCOUNTING afsr afsr axsr bmpsr, kpbmprdr cdsr kpcdrrdr kpcgmrdr kpcgmrdr kpcgmrdr stringssr adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC adWORDPROCESSOR stringssr stringssr dcasr stringssr stringssr adWORDPROCESSOR adWORDPROCESSOR stringssr Page 107 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name EPSF_Fmt Number 22 Category 22 Description Encapsulated PostScript Preview_EPSF_Fmt 23 22 Encapsulated PostScript MS_Executable_Fmt 24 23 G31D_Fmt 25 24 GIF_87a_Fmt 26 25 GIF_89a_Fmt 27 25 HP_Word_PC_Fmt 28 26 IBM_1403_LinePrinter_ 29 27 Fmt IBM_DCF_Script_Fmt 30 28 IBM_DCA_FFT_Fmt 31 29 Interleaf_Fmt 32 30 GEM_Image_Fmt 33 31 IBM_Display_Write_Fmt 34 32 Sun_Raster_Fmt 35 33 Ami_Pro_Fmt 36 35 Ami_Pro_StyleSheet_Fmt 37 35 MORE_Fmt 38 36 Lyrix_Fmt 39 37 MASS_11_Fmt 40 38 MacPaint_Fmt 41 39 MS_Word_Mac_Fmt 42 40 SmartWare_II_Comm_ 43 41 Fmt MS_Word_Win_Fmt 44 42 Multimate_Fmt 45 43 Multimate_Fnote_Fmt 46 43 Multimate_Adv_Fmt 47 43 Multimate_Adv_Fnote_ 48 43 Fmt MSDOS/Windows executable CCITT G3 1D Graphics Interchange Format (GIF87a) Graphics Interchange Format (GIF89a) HP Word PC IBM 1403 Line Printer DCF Script DCA-FFT (IBM Final Form) Interleaf GEM Bit Image IBM DisplayWrite Sun Raster image Lotus Ami Pro Lotus Ami Pro Style Sheet MORE Database MAC Lyrix Word Processing MASS-11 MacPaint Microsoft Word for Macintosh (up to version 3) SmartWare II Microsoft Word for Windows (up to version 6) MultiMate MultiMate Footnote File MultiMate Advantage MultiMate Advantage Footnote File MIME Type application/postscript application/postscript application/x-msdownload image/gif image/gif text/x-ibm-fft application/x-displaywrite image/x-cmu-raster application/x-lotus-amipro application/x-mass-11 image/x-macpaint application/msword application/msword application/x-multimate application/x-multimate-note IDOL KeyView (12.13) Extension EPS EXE GIF GIF HW I4 File Class adRASTERIMAGE, adVECTORGRAPHIC adRASTERIMAGE, adVECTORGRAPHIC adEXECUTABLE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR Readers kpepsrdr kpepsrdr exesr gifsr, kpgifrdr gifsr, kpgifrdr stringssr IC IF, FFT IMG IP RAS, RS, SUN SAM M1, M11 MAC, PIC, PNTG DOC adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adOUTLINE adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adCOMMUNICATION stringssr dw4sr kpsunrdr lasr lasr stringssr stringssr kpmacrdr mbsr DOC, WPS MM MMFN adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR misr stringssr stringssr stringssr stringssr Page 108 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Multimate_Adv_II_Fmt 49 Multimate_Adv_II_Fnote_ 50 Fmt Multiplan_PC_Fmt 51 Multiplan_Mac_Fmt 52 MS_RTF_Fmt 53 MS_Word_PC_Fmt 54 MS_Word_PC_ 55 StyleSheet_Fmt MS_Word_PC_Glossary_ 56 Fmt MS_Word_PC_Driver_ 57 Fmt MS_Word_PC_Misc_Fmt 58 NBI_Async_Archive_Fmt 59 Navy_DIF_Fmt 60 NBI_Net_Archive_Fmt 61 NIOS_TOP_Fmt 62 FileMaker_Mac_Fmt 63 ODA_Q1_11_Fmt 64 ODA_Q1_12_Fmt 65 OLIDIF_Fmt 66 Office_Writer_Fmt 67 PC_Paintbrush_Fmt 68 CPT_Comm_Fmt 69 Lotus_PIC_Fmt 70 Mac_PICT_Fmt 71 Category 43 43 44 44 45 46 46 46 46 46 47 48 49 50 51 52 52 53 55 56 57 58 59 Description MultiMate Advantage II MultiMate Advantage II Footnote File Microsoft Multiplan (PC) Microsoft Multiplan (Mac) Rich Text Format (RTF) Microsoft Word for PC (up to version 6) Microsoft Word for PC (up to version 6) Style Sheet Microsoft Word for PC (up to version 6) Glossary Microsoft Word for PC (up to version 6) Driver Microsoft Word for PC (up to version 6) Miscellaneous File NBI Async Archive Format Navy DIF (document interchange format) NBI OASys Net Archive Format NIOS TOP Filemaker MAC ODA / ODIF Q1 11 ODA / ODIF Q1 12 OLIDIF (Olivetti) Office Writer PC Paintbrush Graphics (PCX) CPT Corporation word processor Lotus PIC Macintosh Raster / QuickDraw Picture MIME Type application/x-ms-multiplan application/x-ms-multiplan application/rtf application/x-ms-wordpc application/x-navy image/vnd.zbrush.pcx image/x-pict image/x-pict Philips_Script_Word_Fmt 72 60 PostScript_Fmt 73 61 PRIMEWORD_Fmt 74 62 Quadratron_Q_One_v1_ 75 63 Philips Script PostScript PRIMEWORD Q-One V1.93J application/postscript IDOL KeyView (12.13) Extension FBX, FNX RTF MW ND NN FP5, FP7 OD OD OW PCX PF PIC PCT PS Q1, QX File Class adWORDPROCESSOR adWORDPROCESSOR Readers stringssr stringssr adSPREADSHEET adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR rtfsr mwsr mwsr adWORDPROCESSOR mwsr adWORDPROCESSOR mwsr adWORDPROCESSOR mwsr adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adDATABASE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adVECTORGRAPHIC adRASTERIMAGE, adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR stringssr nnsr stringssr stringssr stringssr kppcxrdr stringssr kppicrdr kppctrdr pwsr stringssr Page 109 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Fmt Quadratron_Q_One_v2_ 76 Fmt SAMNA_Word_IV_Fmt 77 Ami_Pro_Draw_Fmt 78 Category 64 65 66 Description Q-One V2.0 SAMNA Word Lotus Ami Pro Draw SYLK_Spreadsheet_Fmt 79 67 SmartWare_II_WP_Fmt 80 68 Symphony_Fmt 81 69 Targa_Fmt 82 70 TIFF_Fmt 83 71 SYmbolic LinK (SYLK) format Informix SmartWare II word processor Lotus Symphony spreadsheet Truevision Targa image Tagged Image File Format (TIFF) Targon_Word_Fmt 84 72 Uniplex_Ucalc_Fmt 85 73 Uniplex_WP_Fmt 86 74 MS_Word_UNIX_Fmt 87 75 WANG_PC_Fmt 88 76 WordERA_Fmt 89 77 WANG_WPS_Comm_ 90 78 Fmt WordPerfect_Mac_Fmt 91 79 WordPerfect_Fmt 92 86 WordPerfect_VAX_Fmt 93 139 WordPerfect_Macro_Fmt 94 139 WordPerfect_Dictionary_ 95 139 Fmt WordPerfect_Thesaurus_ 96 139 Fmt WordPerfect_Resource_ 97 139 Fmt WordPerfect_Driver_Fmt 98 139 WordPerfect_Cfg_Fmt 99 139 WordPerfect_ 100 139 Targon Word Uniplex Ucalc Uniplex word processor Microsoft Word UNIX Wang IWP for PC WordERA WANG WPS (Word Processing System) WordPerfect MAC WordPerfect version 4 WordPerfect VAX WordPerfect Macro WordPerfect Spelling Dictionary WordPerfect Thesaurus WordPerfect Resource File WordPerfect Driver WordPerfect Configuration File WordPerfect Hyphenation Dictionary MIME Type application/vnd.symphony image/x-tga image/tiff application/msword application/x-wang-iwp application/x-corel-wordperfect application/x-corel-wordperfect application/x-corel-wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect application/vnd.wordperfect IDOL KeyView (12.13) Extension Q1, QX SAM SDW SLK DOC, SMT WR1 TGA TIF, TIFF TW SS UP DOC DC, GL, FR WF WP, WP4 MRS SPW WWK, PRS IRS, VRS PFX HYC File Class Readers adWORDPROCESSOR stringssr adWORDPROCESSOR adVECTORGRAPHIC, adRASTERIMAGE adSPREADSHEET adWORDPROCESSOR adSPREADSHEET adRASTERIMAGE adRASTERIMAGE, adFAXFORMAT adWORDPROCESSOR adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR stringssr kpsdwrdr swsr kpTGArdr kptifrdr, tifsr stringssr stringssr stringssr stringssr adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR wpmsr stringssr adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR Page 110 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Hyphenation_Fmt WordPerfect_Misc_Fmt WordMARC_Fmt Windows_Metafile_Fmt Number 101 102 103 Category 139 82 83 Description WordPerfect Miscellaneous File WordMARC Composer Windows Metafile Windows_Metafile_ 104 83 NoHdr_Fmt SmartWare_II_DB_Fmt 105 84 WordPerfect_Graphics_ 106 195 Fmt WordStar_Fmt 107 87 WANG_WITA_Fmt 108 88 Xerox_860_Comm_Fmt 109 89 Xerox_Writer_Fmt 110 91 DIF_SpreadSheet_Fmt 111 92 Enable_Spreadsheet_ 112 93 Fmt SuperCalc_Fmt 113 94 UltraCalc_Fmt 114 95 SmartWare_II_SS_Fmt 115 96 SOF_Encapsulation_Fmt 116 97 PowerPoint_Win_Fmt 117 98 PowerPoint_Mac_Fmt 118 99 PowerPoint_95_Fmt 119 212 PowerPoint_97_Fmt 120 272 PageMaker_Mac_Fmt 121 100 PageMaker_Win_Fmt 122 101 MS_Works_Mac_WP_ 123 103 Fmt MS_Works_Mac_DB_ 124 104 Fmt MS_Works_Mac_SS_Fmt 125 105 MS_Works_Mac_Comm_ 126 106 Windows Metafile (no header) Informix SmartWare II database WordPerfect Graphics (version 2 and higher) WordStar WANG WITA Xerox 860 Xerox Writer Data Interchange Format (DIF) Enable Spreadsheet Sorcim SuperCalc spreadsheet UltraCalc spreadsheet Informix SmartWare II spreadsheet Serialized Object Format (SOF) Microsoft PowerPoint PC (up to version 4) Microsoft PowerPoint MAC (up to version 4) Microsoft PowerPoint 95 Microsoft PowerPoint 97 PageMaker for Macintosh PageMaker for Windows Microsoft Works Word Processor for MAC Microsoft Works Database for MAC Microsoft Works Spreadsheet for MAC Microsoft Works Communication for MAC MIME Type application/vnd.wordperfect video/x-ms-wm image/wmf image/wmf database/x-smartdata application/vnd.wordperfect application/vnd.wordstar application/dif+xml application/vnd.epson.ssf application/x-supercalc5 application/x-smartware application/java-serialized-object application/x-ms-powerpoint application/x-ms-powerpoint application/x-ms-powerpoint application/x-ms-powerpoint application/x-msworks application/x-msworks application/x-msworks application/x-msworks IDOL KeyView (12.13) Extension WM, PW WMF WMF WPG, QPG WS, WSD WT DIF SSF CAL SOF PPT PPT PPT PPT MWK File Class Readers adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC, adRASTERIMAGE adVECTORGRAPHIC stringssr kpwmfrdr kpwmfrdr adDATABASE adRASTERIMAGE, adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adSPREADSHEET kpwg2rdr, kpwpgrdr stringssr stringssr stringssr stringssr difsr adSPREADSHEET adSPREADSHEET adSPREADSHEET adENCAPSULATION adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION adDESKTOPPUBLSH adDESKTOPPUBLSH adWORDPROCESSOR kpp40rdr olesr kpp95rdr kpp97rdr stringssr adDATABASE adSPREADSHEET adCOMMUNICATION mwssr Page 111 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Fmt MS_Works_DOS_WP_ 127 Fmt MS_Works_DOS_DB_ 128 Fmt MS_Works_DOS_SS_ 129 Fmt MS_Works_Win_WP_ 130 Fmt MS_Works_Win_DB_Fmt 131 MS_Works_Win_SS_Fmt 132 PC_Library_Fmt 133 MacWrite_Fmt 134 MacWrite_II_Fmt 135 Freehand_Fmt 136 Category 107 108 109 227 231 228 111 112 113 114 Description MIME Type Microsoft Works Word Processor for DOS application/x-msworks Microsoft Works Database for DOS application/x-msworks Microsoft Works Spreadsheet for DOS application/x-msworks Microsoft Works Word Processor for Windows (up application/x-msworks to 2000) Microsoft Works Database for Windows application/x-msworks Microsoft Works Spreadsheet for Windows application/x-msworks DOS/Windows Object Library application/x-archive MacWrite application/macwriteii MacWrite II application/macwriteii Adobe/Macrovision FreeHand image image/x-freehand Disk_Doubler_Fmt 137 115 HP_GL_Fmt 138 116 FrameMaker_Fmt 139 136 FrameMaker_Book_Fmt 140 136 Maker_Markup_ Language_Fmt 141 174 Maker_Interchange_Fmt 142 117 JPEG_File_Interchange_ 143 118 Fmt Reflex_Fmt 144 119 Framework_Fmt 145 276 Framework_II_Fmt 146 120 Paradox_Fmt 147 121 MS_Windows_Write_Fmt 148 123 Quattro_Pro_DOS_Fmt 149 124 Quattro_Pro_Win_Fmt 150 184 Disk Doubler HP Graphics Language FrameMaker FrameMaker Book Maker Markup Language vector/x-hpgl application/vnd.framemaker application/vnd.framemaker application/vnd.mif Adobe FrameMaker Interchange Format (MIF) JPEG File Interchange Format application/x-mif image/jpeg Borland Reflex database Framework office suite Framework II office suite Borland Paradox database Microsoft Windows Write Corel Quattro Pro for DOS Corel Quattro Pro for Windows database/reflex application/paradox application/x-ms-write application/x-quattropro application/x-quattro-win IDOL KeyView (12.13) Extension File Class Readers WPS adWORDPROCESSOR stringssr WDB adDATABASE adSPREADSHEET mwssr WPS, W40 adWORDPROCESSOR msw6sr, mswsr WKS, S30, S40 LIB, A FH3, FH4, FH5, FH7, FH8, FH9, FH10, FH11 HPGL, HPG FM, FRM BOOK adDATABASE adSPREADSHEET adLIBRARY adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC mwssr stringssr stringssr adENCAPSULATION adVECTORGRAPHIC adDESKTOPPUBLSH adDESKTOPPUBLSH adDESKTOPPUBLSH MIF JPG, JPEG, JFIF, JFI FW3 DB WRI WQ1 WB1, WB2, WB3 adWORDPROCESSOR adRASTERIMAGE mifsr jpgsr, kpjpgrdr adDATABASE adMIXED adMIXED adDATABASE adWORDPROCESSOR adSPREADSHEET adSPREADSHEET mwsr qpssr Page 112 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Persuasion_Fmt 151 Windows_Icon_Fmt 152 Windows_Cursor_Fmt 153 MS_Project_Activity_Fmt 154 MS_Project_Resource_ 155 Fmt MS_Project_Calc_Fmt 156 PKZIP_Fmt 157 Category 126 128 133 129 129 129 132 Description Adobe Persuasion Windows Icon Format Windows Cursor Microsoft Project (up to version 3) activity file Microsoft Project (up to version 3) resource file Microsoft Project (up to version 3) calc file ZIP Archive MIME Type image/vnd.microsoft.icoN image/x-win-bitmap application/zip Quark_Xpress_Fmt 158 134 ARC_PAK_Archive_Fmt 159 135 MS_Publisher_Fmt 160 137 PlanPerfect_Fmt 161 138 WordPerfect_Auxiliary_ 162 139 Fmt MS_WAVE_Audio_Fmt 163 141 MIDI_Audio_Fmt 164 142 AutoCAD_DXF_Binary_ 165 143 Fmt AutoCAD_DXF_Text_Fmt 166 143 Quark Xpress MAC PAK/ARC Archive Microsoft Publisher (up to version 3) PlanPerfect Corel WordPerfect auxiliary file Microsoft Wave audio MIDI audio Autodesk AutoCAD DXF binary format Autodesk AutoCAD DXF text format application/x-mspublisher audio/wav audio/mid image/x-dxf image/x-dxf dBase_Fmt 167 144 OS_2_PM_Metafile_Fmt 168 145 Lasergraphics_ Language_Fmt 169 146 AutoShade_Rendering_ 170 147 Fmt GEM_VDI_Fmt 171 148 Windows_Help_Fmt 172 149 Volkswriter_Fmt 173 150 Ability_WP_Fmt 174 151 Ability_DB_Fmt 175 151 Ability_SS_Fmt 176 151 dBase Database III+/IV OS/2 PM Metafile Lasergraphics Language AutoShade Rendering GEM VDI Metafile image Windows Help File Volkswriter word processor Ability Word Processor Ability Database Ability Spreadsheet application/x-dbf application/x-autoshade application/winhlp IDOL KeyView (12.13) Extension ICO CUR ZIP, ZIPX ARC, PAK PUB WPW WAV MID, MIDI DXF DXF DBF, VCX MET GEM, GDI HLP VW4 File Class adPRESENTATION adRASTERIMAGE adRASTERIMAGE adSCHEDULE adSCHEDULE Readers kpicordr adSCHEDULE adENCAPSULATION, adEXECUTABLE adDESKTOPPUBLSH adENCAPSULATION adDESKTOPPUBLSH adSCHEDULE adMISC, adENCAPSULATION adSOUND adSOUND adVECTORGRAPHIC adVECTORGRAPHIC adDATABASE adVECTORGRAPHIC adVECTORGRAPHIC unzip mspubsr MCI, riffsr MCI kpDXFrdr, kpODArdr kpDXFrdr, kpODArdr dbfsr adVECTORGRAPHIC adVECTORGRAPHIC adMISC adWORDPROCESSOR adWORDPROCESSOR adDATABASE adSPREADSHEET stringssr Page 113 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Ability_Comm_Fmt 177 Ability_Image_Fmt 178 XyWrite_Fmt 179 CSV_Fmt 180 IBM_Writing_Assistant_ 181 Fmt WordStar_2000_Fmt 182 HP_PCL_Fmt 183 UNIX_Exe_PreSysV_ 184 VAX_Fmt UNIX_Exe_Basic_16_ 185 Fmt UNIX_Exe_x86_Fmt 186 UNIX_Exe_iAPX_286_ 187 Fmt UNIX_Exe_MC68k_Fmt 188 UNIX_Exe_3B20_Fmt 189 UNIX_Exe_WE32000_ 190 Fmt UNIX_Exe_VAX_Fmt 191 UNIX_Exe_Bell_5_Fmt 192 UNIX_Obj_VAX_ 193 Demand_Fmt UNIX_Obj_MS8086_Fmt 194 UNIX_Obj_Z8000_Fmt 195 AU_Audio_Fmt 196 NeWS_Font_Fmt 197 cpio_Archive_CRChdr_ 198 Fmt cpio_Archive_CHRhdr_ 199 Fmt PEX_Binary_Archive_ 200 Fmt Sun_vfont_Fmt 201 Category 151 151 152 153 154 155 157 158 158 158 158 158 158 158 158 158 159 159 159 161 162 163 163 164 165 Description Ability Presentation Ability Image XYWrite / Nota Bene CSV (Comma Separated Values) IBM Writing Assistant WordStar 2000 HP Printer Command Language (PCL) UNIX executable (PDP-11/pre-System V VAX) UNIX executable (Basic-16) UNIX executable (x86) UNIX executable (iAPX 286) UNIX executable (MC680x0) UNIX executable (3B20) UNIX executable (WE32000) UNIX executable (VAX) UNIX executable (Bell 5.0) UNIX object module (VAX Demand) UNIX object module (old MS 8086) UNIX object module (Z8000) NeXT/Sun Audio Data NeWS bitmap font cpio archive (CRC Header) cpio archive (CHR Header) SUN PEX Binary Archive SUN vfont Definition MIME Type text/csv application/pcl application/octet-stream application/octet-stream application/octet-stream application/octet-stream application/octet-stream application/octet-stream application/octet-stream application/octet-stream application/octet-stream audio/basic application/x-cpio application/x-cpio IDOL KeyView (12.13) Extension XY4 CSV IWA WS2 PCL, PRN AU, SND CPIO CPIO File Class adCOMMUNICATION adRASTERIMAGE adWORDPROCESSOR adSPREADSHEET adWORDPROCESSOR Readers xywsr csvsr stringssr adWORDPROCESSOR adVECTORGRAPHIC adEXECUTABLE stringssr adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adEXECUTABLE adOBJECTMODULE adOBJECTMODULE adOBJECTMODULE adSOUND MCI adFONT adENCAPSULATION adENCAPSULATION adENCAPSULATION adFONT Page 114 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Curses_Screen_Fmt UUEncoded_Fmt WriteNow_Fmt PC_Obj_Fmt Windows_Group_Fmt TrueType_Font_Fmt Windows_PIF_Fmt MS_COM_Executable_ Fmt StuffIt_Fmt PeachCalc_Fmt Wang_GDL_Fmt Q_A_DOS_Fmt Q_A_Win_Fmt WPS_PLUS_Fmt DCX_Fmt OLE_Fmt EBCDIC_Fmt DCS_Fmt UNIX_SHAR_Fmt Lotus_Notes_BitMap_ Fmt Lotus_Notes_CDF_Fmt Compress_Fmt GZ_Compress_Fmt TAR_Fmt ODIF_FOD26_Fmt ODIF_FOD36_Fmt ALIS_Fmt Envoy_Fmt Number 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 Category 166 167 168 169 170 171 172 173 175 176 177 179 180 181 182 183 186 187 190 191 193 192 198 194 196 196 197 199 Description Curses Screen Image UU-encoded text WriteNow MAC DOS/Windows Object Module Windows Group TrueType Font Program Information File (PIF) PC (.COM) StuffIt (MAC) PeachCalc WANG Office GDL Header Symantec Q&A for DOS Symantec Q&A for Windows WPS-PLUS DCX FAX Format(PCX images) OLE Compound Document EBCDIC Text DCS SHAR shell archive format Lotus Notes Bitmap Lotus Notes CDF UNIX Compress archive GZ Compress archive TAR (tape archive) Open Document Architecture (ODA / ODIF) FOD26 Open Document Architecture (ODA / ODIF) FOD36 ALIS WordPerfect Envoy MIME Type text/x-uuencode application/octet-stream application/x-font-ttf application/octet-stream application/octet-stream application/x-stuffit application/x-qa-write application/x-qa-write application/vnd.ms-wpl image/dcx application/ebcdic application/x-shar application/cdf application/x-compress application/gzip application/tar application/oda application/oda application/envoy IDOL KeyView (12.13) Extension UUE OBJ, EXP GRP TTF PIF COM HQX CAL JW WPL DCX OLE SHAR CDF Z GZ TAR F26 F36 EVY File Class adRASTERIMAGE adENCAPSULATION adWORDPROCESSOR adOBJECTMODULE adMISC adFONT adMISC adEXECUTABLE Readers uudsr stringssr adENCAPSULATION adSPREADSHEET adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adFAXFORMAT adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION adRASTERIMAGE stringssr stringssr stringssr kpdcxrdr olesr adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adENCAPSULATION adWORDPROCESSOR stringssr kvzee, kvzeesr kvgz, kvgzsr tarsr adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR Page 115 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name PDF_Fmt Number 230 Category 200 Description Adobe PDF (Portable Document Format) BinHex_Fmt 231 206 SMTP_Fmt 232 207 MIME_Fmt 233 208 USENET_Fmt 234 264 SGML_Fmt 235 209 HTML_Fmt 236 210 ACT_Fmt 237 211 PNG_Fmt 238 213 MS_Video_Fmt 239 214 Windows_Animated_ 240 215 Cursor_Fmt Windows_CPP_Obj_ 241 216 Storage_Fmt Windows_Palette_Fmt 242 217 RIFF_DIB_Fmt 243 218 RIFF_MIDI_Fmt 244 219 RIFF_Multimedia_Movie_ 245 220 Fmt MPEG_Fmt 246 221 QuickTime_Fmt 247 222 AIFF_Fmt 248 223 Amiga_MOD_Fmt 249 224 Amiga_IFF_8SVX_Fmt 250 225 Creative_Voice_Audio_ 251 226 Fmt AutoDesk_Animator_FLI_ 252 229 Fmt AutoDesk_AnimatorPro_ 253 230 FLC_Fmt Compactor_Archive_Fmt 254 233 BinHex SMTP (Text Mail / Outlook Express) MIME (EML / MBX email)1 USENET SGML HTML ACT! CRM software Portable Network Graphics (PNG) Video for Windows (AVI) Windows Animated Cursor Windows C++ Object Storage Windows Palette RIFF Device Independent Bitmap RIFF MIDI RIFF Multimedia Movie MPEG Movie QuickTime Movie, MPEG-4 audio Audio Interchange File Format (AIFF) Amiga MOD Amiga IFF (8SVX) Sound Creative Voice (VOC) AutoDesk Animator FLIC AutoDesk Animator Pro FLIC Compactor / Compact Pro MIME Type application/pdf application/mac-binhex40 message/rfc822 message/rfc822 message/news text/sgml text/html image/png video/avi audio/midi video/mpeg video/quicktime audio/aiff audio/x-8svx video/x-fli video/x-flc application/mac-compactpro IDOL KeyView (12.13) Extension PDF HQX SMTP EML, MBX SGML HTM, HTML ACT PNG AVI ANI PAL RMI MMM MOV, QT, MP4 AIF, AIFF, AIFC MOD IFF VOC FLI FLC File Class adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adMOVIE adRASTERIMAGE Readers kppdf2rdr, kppdfrdr, pdf2sr, pdfsr kvhqxsr emlsr mbxsr afsr htmsr kppngrdr, pngsr MCI kpanirdr adMIXED adRASTERIMAGE adRASTERIMAGE adSOUND adMOVIE adMOVIE adMOVIE adSOUND adSOUND adSOUND adSOUND MCI, mpeg4sr MCI, aiffsr adANIMATION adANIMATION adENCAPSULATION Page 116 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number VRML_Fmt 255 QuickDraw_3D_Metafile_ 256 Fmt PGP_Secret_Keyring_ 257 Fmt PGP_Public_Keyring_ 258 Fmt PGP_Encrypted_Data_ 259 Fmt PGP_Signed_Data_Fmt 260 PGP_SignedEncrypted_ 261 Data_Fmt PGP_Sign_Certificate_ 262 Fmt PGP_Compressed_ 263 Data_Fmt PGP_ASCII_Public_ 264 Keyring_Fmt PGP_ASCII_Encoded_ 265 Fmt PGP_ASCII_Signed_Fmt 266 OLE_DIB_Fmt 267 SGI_Image_Fmt 268 Lotus_ScreenCam_Fmt 269 MPEG_Audio_Fmt 270 FTP_Software_Session_ 271 Fmt Netscape_Bookmark_ 272 File_Fmt Corel_Draw_CMX_Fmt 273 AutoDesk_DWG_Fmt 274 Category 234 235 236 237 238 239 240 241 246 242 243 244 245 247 248 249 250 210 252 253 Description VRML QuickDraw 3D Metafile PGP secret key PGP public key PGP encrypted data PGP signed data PGP signed and encrypted data PGP signature certificate PGP compressed data ASCII-armored PGP public key ASCII-armored PGP-encoded message ASCII-armored PGP signed OLE DIB object SGI RGB Image Lotus ScreenCam MPEG-1 Audio layer3 (MP3) FTP Session Data Netscape Bookmark File Corel CMX AutoDesk AutoCAD Drawing (DWG) AutoDesk_WHIP_Fmt 275 254 Macromedia_Director_ 276 255 Fmt AutoDesk WHIP Macromedia Shockwave/Adobe Director MIME Type model/vrml application/pgp application/pgp application/pgp application/pgp application/pgp application/pgp-signature application/pgp application/pgp application/pgp application/pgp image/sgi application/vnd.lotus-screencam audio/mpeg text/html application/cmx image/x-dwg application/x-director IDOL KeyView (12.13) Extension WRL File Class adVECTORGRAPHIC adVECTORGRAPHIC Readers adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION SIG adENCAPSULATION adENCAPSULATION PGP adENCAPSULATION adENCAPSULATION adENCAPSULATION adRASTERIMAGE RGB adRASTERIMAGE SCM adANIMATION MPEGA, MPG, MP3 adSOUND STE adCOMMUNICATION kpsgirdr MCI, mp3sr adWORDPROCESSOR htmsr CMX DWG WHP DCR, DXR, DIR adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adANIMATION kpDWGrdr, kpODArdr Page 117 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Real_Audio_Fmt 277 MSDOS_Device_Driver_ 278 Fmt Micrografx_Designer_Fmt 279 SVF_Fmt 280 Applix_Words_Fmt 281 Applix_Graphics_Fmt 282 MS_Access_Fmt 283 MS_Access_95_Fmt 284 MS_Access_97_Fmt 285 MacBinary_Fmt 286 Apple_Single_Fmt 287 Apple_Double_Fmt 288 Enhanced_Metafile_Fmt 289 MS_Office_Drawing_Fmt 290 XML_Fmt 291 DeVice_Independent_ 292 Fmt Unicode_Fmt 293 Lotus_123_Worksheet_ 294 Fmt Lotus_123_Format_Fmt 295 Lotus_123_97_Fmt 296 Lotus_Word_Pro_96_Fmt 297 Lotus_Word_Pro_97_Fmt 298 Freelance_DOS_Fmt 299 Freelance_Win_Fmt 300 Freelance_OS2_Fmt 301 Freelance_96_Fmt 302 Freelance_97_Fmt 303 MS_Word_95_Fmt 304 Category 256 257 258 259 261 262 263 263 263 265 266 267 270 271 285 274 275 81 81 81 268 268 140 140 140 140 140 189 Description Real Audio MSDOS Device Driver Micrografx Designer Simple Vector Format (SVF) Applix Words Applix Graphics Microsoft Access (versions 1 and 2) Microsoft Access 95 Microsoft Access 97 MacBinary Apple Single Apple Double Enhanced Metafile Microsoft Office Drawing XML DeVice Independent file (DVI) Unicode text file Lotus 1-2-3 Lotus 1-2-3 Formatting Lotus 1-2-3 97 Lotus Word Pro 96 Lotus Word Pro 97 Lotus Freelance for DOS Lotus Freelance for Windows Lotus Freelance for OS/2 Lotus Freelance 96 Lotus Freelance 97 Microsoft Word 95 MIME Type audio/x-pn-realaudio application/octet-stream image/x-svf application/x-applix-word application/x-msaccess application/msaccess application/msaccess application/x-macbinary multipart/appledouble image/x-emf text/xml application/x-dvi text/plain application/x-lotus-123 application/x-123 application/x-lotus-123 application/vnd.lotus-wordpro application/vnd.lotus-wordpro application/x-freelance application/x-freelance application/x-freelance application/x-freelance application/x-freelance application/msword IDOL KeyView (12.13) Extension RM, RA SYS File Class adSOUND adEXECUTABLE Readers DSF SVF AW AG MDB MDB MDB BIN AD EMF XML DVI adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adPRESENTATION adDATABASE adDATABASE adDATABASE adENCAPSULATION adENCAPSULATION adENCAPSULATION adVECTORGRAPHIC adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC awsr kpagrdr mdbsr mdbsr mdbsr macbinsr kpemfrdr kpmsordr xmlsr UNI WKS, WK1, WK3, WK4 FM3 123 LWP, MWP LWP, MWP PRZ PRE, FLW PRS PRZ PRZ DOC adWORDPROCESSOR adSPREADSHEET unisr wkssr adSPREADSHEET adSPREADSHEET adWORDPROCESSOR adWORDPROCESSOR adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION adPRESENTATION adWORDPROCESSOR l123sr l123sr lwpsr lwpsr kpprzrdr kpprerdr kpprerdr kpprzrdr kpprzrdr mw6sr Page 118 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number MS_Word_97_Fmt 305 Excel_Fmt 306 Excel_Chart_Fmt 307 Excel_Macro_Fmt 308 Excel_95_Fmt 309 Excel_97_Fmt 310 Corel_Presentations_Fmt 311 Harvard_Graphics_Fmt 312 Harvard_Graphics_ 313 Chart_Fmt Harvard_Graphics_ 314 Symbol_Fmt Harvard_Graphics_Cfg_ 315 Fmt Harvard_Graphics_ 316 Palette_Fmt Lotus_123_R9_Fmt 317 Applix_Spreadsheets_ 318 Fmt MS_Pocket_Word_Fmt 319 MS_DIB_Fmt 320 MS_Word_2000_Fmt 321 Excel_2000_Fmt 322 PowerPoint_2000_Fmt 323 MS_Access_2000_Fmt 324 MS_Project_4_Fmt 325 MS_Project_41_Fmt 326 MS_Project_98_Fmt 327 Folio_Flat_Fmt 328 HWP_Fmt 329 ICHITARO_Fmt 330 IS_XML_Fmt 331 Category 269 90 90 90 188 188 127 131 131 131 131 131 81 278 45 279 269 188 272 263 281 281 281 282 283 284 273 Description Microsoft Word 97 Microsoft Excel (up to version 5) Microsoft Excel (up to version 5) chart Microsoft Excel (up to version 5) macro Microsoft Excel 95 Microsoft Excel 97 Corel Presentations Harvard Graphics Harvard Graphics Chart Harvard Graphics Symbol File (v3) Harvard Graphics Configuration File Harvard Graphics Palette Lotus 1-2-3 Release 9 Applix Spreadsheets Microsoft Pocket Word for Handheld PC Microsoft Device Independent Bitmap Microsoft Word 2000 Microsoft Excel 2000 Microsoft PowerPoint 2000 Microsoft Access 2000 Microsoft Project 4 Microsoft Project 4.1 Microsoft Project 98 Folio Flat File Haansoft Hangul HWP (Arae-Ah Hangul) ICHITARO (v4-10) Extended or Custom XML MIME Type application/msword application/x-ms-excel application/x-ms-excel application/vnd.ms-excel application/x-ms-excel application/x-ms-excel application/x-corelpresentations application/x-lotus-123 application/x-applix-spreadsheet image/bmp application/msword application/x-ms-excel application/x-ms-powerpoint application/x-msaccess application/vnd.ms-project application/x-hwp application/x-ichitaro text/xml IDOL KeyView (12.13) Extension DOC, WPS, WBK XLS XLC XLM XLS XLS, XLR SHW, PRC PR4 CH3, CHT File Class adWORDPROCESSOR adSPREADSHEET adSPREADSHEET adSPREADSHEET adSPREADSHEET adSPREADSHEET adPRESENTATION adPRESENTATION adVECTORGRAPHIC Readers mw8sr xlssr xlssr xlssr xlssr xlssr kpshwrdr SY3 adVECTORGRAPHIC adVECTORGRAPHIC PL adVECTORGRAPHIC 123 adSPREADSHEET l123sr AS adSPREADSHEET assr PWD DIB DOC XLS PPT MDB MPP MPP MPP FFF HWP JTD XML adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adSPREADSHEET adPRESENTATION adDATABASE adSCHEDULE adSCHEDULE adSCHEDULE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR rtfsr mw8sr xlssr kpp97rdr mdbsr mppsr mppsr mppsr foliosr hwposr, hwpsr jtdsr Page 119 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Oasys_Fmt PBM_ASC_Fmt PBM_BIN_Fmt PGM_ASC_Fmt PGM_BIN_Fmt PPM_ASC_Fmt PPM_BIN_Fmt XBM_Fmt XPM_Fmt FPX_Fmt PCD_Fmt MS_Visio_Fmt Number 332 333 334 335 336 337 338 339 340 341 342 343 Category 286 287 287 288 288 289 289 290 291 292 293 294 Description MIME Type Fujitsu OASYS application/vnd.fujitsu.oasys Portable Bitmap Utilities ASCII format (PBM) image/pbm Portable Bitmap Utilities BINARY format (PBM) image/pbm Portable Greymap Utilities ASCII format (PGM) image/x-pgm Portable Greymap Utilities BINARY format (PGM) image/x-pgm Portable Pixmap Utilities ASCII format (PPM) image/x-portable-pixmap Portable Pixmap Utilities BINARY format (PPM) image/x-portable-pixmap X-Window X Bitmap format (XBM) image/x-xbitmap X-Window X Pixmap format (XPM) image/xpm Kodak FlashPix FPX Image format image/fpx Kodak Photo CD Image format image/pcd Microsoft Visio (up to version 11) image/x-vsd MS_Project_2000_Fmt 344 281 MS_Outlook_Fmt 345 295 ELF_Relocatable_Fmt 346 159 ELF_Executable_Fmt 347 158 ELF_Dynamic_Lib_Fmt 348 160 MS_Word_XML_Fmt 349 285 MS_Excel_XML_Fmt 350 285 MS_Visio_XML_Fmt 351 285 SO_Text_XML_Fmt 352 314 SO_Spreadsheet_XML_ 353 315 Fmt SO_Presentation_XML_ 354 316 Fmt XHTML_Fmt 355 296 MS_OutlookPST_Fmt 356 297 Microsoft Project 2000 Microsoft Outlook message ELF Relocatable ELF Executable ELF Dynamic Library Microsoft Word 2003 XML Microsoft Excel 2003 XML Microsoft Visio 2003 XML OpenDocument format (OpenOffice 1/StarOffice 6,7) Text XML OpenDocument format (OpenOffice 1/StarOffice 6,7) Spreadsheet XML OpenDocument format (OpenOffice 1/StarOffice 6,7) Presentation XML XHTML Microsoft Outlook Personal Folders File (.pst) application/vnd.ms-project application/vnd.ms-outlook application/octet-stream application/octet-stream application/octet-stream text/xml text/xml text/xml application/vnd.sun.xml.writer application/vnd.sun.xml.calc application/vnd.sun.xml.impress text/xhtml application/vnd.ms-outlook-pst RAR_Fmt 357 298 RAR archive format application/x-rar-compressed Lotus_Notes_NSF_Fmt 358 299 IBM Lotus Notes Database NSF/NTF application/x-lotus-notes IDOL KeyView (12.13) Extension OAS, OA2, OA3 PBM, PNM PBM, PNM PGM, PNM PGM, PNM PPM, PNM PPM, PNM XBM XPM FPX PCD VSD MPP MSG, OFT O SO XML XML VDX SXW File Class adWORDPROCESSOR adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adPRESENTATION adSCHEDULE adENCAPSULATION adOBJECTMODULE adEXECUTABLE adLIBRARY adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR Readers oa2sr olesr kpVSD2rdr, vsdsr mppsr msgsr xmlsr xmlsr xmlsr odfwpsr SXC, STC adSPREADSHEET sosr SXD, SXI adPRESENTATION kpodfrdr XML, XHTML, XHT PST RAR, REV, R00, R01 NSF adWORDPROCESSOR adENCAPSULATION adENCAPSULATION, adEXECUTABLE adENCAPSULATION pstnsr, pstsr, pstxsr rarsr nsfsr Page 120 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Macromedia_Flash_Fmt MS_Word_2007_Fmt MS_Excel_2007_Fmt MS_PPT_2007_Fmt OpenPGP_Fmt Number 359 360 361 362 363 Intergraph_V7_DGN_Fmt 364 MicroStation_V8_DGN_ 365 Fmt MS_Word_Macro_2007_ 366 Fmt MS_Excel_Macro_2007_ 367 Fmt MS_PPT_Macro_2007_ 368 Fmt LZH_Fmt 369 Office_2007_Fmt 370 MS_XPS_Fmt 371 Lotus_Domino_DXL_Fmt 372 ODF_Text_Fmt 373 ODF_Spreadsheet_Fmt 374 ODF_Presentation_Fmt 375 Legato_Extender_ONM_ 376 Fmt bin_Unknown_Fmt 377 TNEF_Fmt 378 CADAM_Drawing_Fmt 379 CADAM_Drawing_ 380 Overlay_Fmt NURSTOR_Drawing_Fmt 381 HP_GLP_Fmt 382 Category 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 Description Macromedia Flash (.swf) Microsoft Word 2007 XML - Docx Microsoft Excel 2007 XML Microsoft PowerPoint 2007 XML OpenPGP/GPG Message Format (with new packet format) Intergraph Standard File Format (ISFF) V7 DGN (non-OLE) MicroStation V8 DGN (OLE) Microsoft Word Macro 2007 XML Microsoft Excel Macro 2007 XML Microsoft PPT Macro 2007 XML LZH Archive Office 2007 document that cannot be further classified (often RMS-encrypted) Microsoft Open XML Paper Specification (XPS/OXPS) IBM Domino Data in XML format (.dxl) ODF Text ODF Spreadsheet ODF Presentation Legato Extender Native Message ONM Bin unknown format (.xxx) Transport Neutral Encapsulation Format (TNEF) CADAM Drawing CADAM Drawing Overlay NURSTOR Drawing HP Graphics Language (Plotter) MIME Type application/x-shockwave-flash application/x-ms-word07 application/x-ms-excel07 application/x-ms-powerpoint07 application/pgp-encrypted application/x-ms-word07m application/x-ms-excel07m application/x-ms-powerpoint07m application/x-lzh-compressed application/vnd.ms-xpsdocument application/x-dxlfile application/vnd.oasis.opendocument.text application/vnd.oasis.opendocument.spreadsheet application/vnd.oasis.opendocument.presentation application/x-lotus-notes application/vnd.ms-tnef vector/x-hpgl2 Extension File Class SWF, SWD adWORDPROCESSOR DOCX, DOTX adWORDPROCESSOR XLSX, XLTX adSPREADSHEET PPTX, POTX, PPSX adPRESENTATION GPG, PGP adENCAPSULATION Readers swfsr mwxsr xlsxsr kpppxrdr DGN adVECTORGRAPHIC DGN adVECTORGRAPHIC olesr DOCM, DOTM adWORDPROCESSOR mwxsr XLSM, XLTM, XLAM adSPREADSHEET xlsxsr PPTM, POTM, PPSM, PPAM LZH, LHA DOCX, XLSX, PPTX, XLSB XPS, OXPS adPRESENTATION kpppxrdr adENCAPSULATION adMISC lzhsr adWORDPROCESSOR xpssr DXL ODT ODS ODP ONM adENCAPSULATION adWORDPROCESSOR adSPREADSHEET adPRESENTATION adENCAPSULATION dxlsr odfwpsr odfsssr kpodfrdr onmsr CDD CDO adWORDPROCESSOR adENCAPSULATION adVECTORGRAPHIC adVECTORGRAPHIC tnefsr NUR HPG adVECTORGRAPHIC adVECTORGRAPHIC IDOL KeyView (12.13) Page 121 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name ASF_Fmt WMA_Fmt WMV_Fmt EMX_Fmt Z7Z_Fmt Number 383 384 385 386 387 Category 324 325 326 327 328 Description Advanced Systems Format (ASF) Windows Media Audio Format (WMA) Windows Media Video Format (WMV) Legato EMailXtender Archives Format (EMX) 7-Zip archive (7z) MS_Excel_Binary_2007_ 388 329 Fmt CAB_Fmt 389 330 CATIA_Fmt 390 331 Microsoft Excel Binary 2007 Microsoft Cabinet File (CAB) CATIA Formats (CAT*) YIM_Fmt 391 332 ODF_Drawing_Fmt 392 316 Founder_CEB_Fmt 393 333 QPW_Fmt 394 334 MHT_Fmt 395 335 MDI_Fmt 396 336 GRV_Fmt 397 337 IWWP_Fmt 398 338 IWSS_Fmt 399 339 IWPG_Fmt 400 340 BKF_Fmt 401 341 MS_Access_2007_Fmt 402 342 ENT_Fmt 403 343 DMG_Fmt 404 344 CWK_Fmt 405 345 OO3_Fmt 406 346 OPML_Fmt 407 347 Omni_Graffle_XML_Fmt 408 348 PSD_Fmt 409 349 Apple_Binary_PList_Fmt 410 350 Yahoo! Instant Messenger History ODF Drawing/Graphics Founder Chinese E-paper Basic (ceb) Corel Quattro Pro 9+ for Windows MIME HTML MHTML format (MHT)1 Microsoft Document Imaging Format Microsoft Office Groove Format Apple iWork Pages format Apple iWork Numbers format Apple iWork Keynote format Microsoft Windows Backup File Microsoft Access 2007 Microsoft Entourage Database Format Mac Disk Copy Disk Image File AppleWorks (Claris Works) File Omni Outliner V3 File Omni Outliner OPML File Omni Graffle XML File Adobe Photoshop Document Apple Binary Property List format MIME Type application/x-ms-asf audio/x-ms-wma video/x-ms-wmv application/7z Extension ASF WMA WMV EMX 7Z application/vnd.ms-excel.sheet.binary.macroenabled.12 XLSB File Class adMISC adSOUND adMOVIE adENCAPSULATION adENCAPSULATION, adEXECUTABLE adSPREADSHEET Readers asfsr asfsr asfsr emxsr z7zsr xlsbsr application/vnd.ms-cab-compressed application/vnd.oasis.opendocument.graphics application/ceb application/quattro-pro multipart/related image/vnd.ms-modi application/vnd.groove-injector application/vnd.apple.pages application/vnd.apple.numbers application/vnd.apple.keynote application/msaccess application/x-apple-diskimage application/appleworks image/vnd.adobe.photoshop application/x-bplist CAB CATPART, CATPRODUCT2 DAT ODG CEB QPW MHT, MHTML MDI GRV PAGES NUMBERS KEY BKF ACCDB DMG, ISO, IMAGE CWK OO3 OPML GRAFFLE PSD, PSB PLIST adENCAPSULATION adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC adWORDPROCESSOR adSPREADSHEET adWORDPROCESSOR adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adENCAPSULATION adDATABASE adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC adRASTERIMAGE adMISC cabsr kpCATrdr yimsr kpodfrdr cebsr qpwsr mhtsr iwwpsr iwsssr kpIWPGrdr bkfsr mdbsr entsr dmgsr stringssr oo3sr oo3sr kpGFLrdr psdsr IDOL KeyView (12.13) Page 122 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Apple_iChat_Fmt OOUTLINE_Fmt BZIP2_Fmt ISO_Fmt DocuWorks_Fmt RealMedia_Fmt AC3Audio_Fmt NEF_Fmt SolidWorks_Fmt Number 411 412 413 414 415 416 417 418 419 Category 351 352 353 354 355 356 357 358 359 Description Apple iChat format OOutliner File Bzip 2 Compressed File ISO-9660 CD Disc Image Format DocuWorks Format RealMedia Streaming Media AC3 Audio File Format Nero Encrypted File SolidWorks Format Files MIME Type application/x-bzip2 application/x-iso9660-image application/vnd.fujixerox.docuworks application/vnd.rn-realmedia audio/ac3 XFDL_Fmt 420 366 Apple_XML_PList_Fmt 421 367 OneNote_Fmt 422 368 IFilter_Fmt 423 369 Dicom_Fmt 424 370 EnCase_Fmt 425 371 Extensible Forms Description Language Apple XML Property List format Microsoft OneNote Note Format iFilter Digital Imaging and Communications in Medicine (Dicom) Expert Witness Compression Format (EnCase) application/x-xfdl application/x-plist application/onenote application/dicom Scrap_Fmt 426 372 MS_Project_2007_Fmt 427 373 MS_Publisher_98_Fmt 428 374 Skype_Fmt 429 375 Hl7_Fmt 430 377 MS_OutlookOST_Fmt 431 378 Epub_Fmt 432 379 MS_OEDBX_Fmt 433 380 BB_Activ_Fmt DiskImage_Fmt Milestone_Fmt 434 381 435 382 436 383 Shell Scrap Object File Microsoft Project 2007 Microsoft Publisher from version 98 Skype Log File Health level7 message Microsoft Outlook Offline Folders File (OST) Open Publication Structure electronic publication Microsoft Outlook Express DBX Message Database BlackBerry Activation File Disk Image Milestone Document application/vnd.ms-project application/x-mspublisher application/vnd.ms-outlook-pst application/epub+zip E_Transcript_Fmt 437 384 RealLegal E-Transcript File IDOL KeyView (12.13) Extension File Class ICHAT adWORDPROCESSOR OOUTLINE adWORDPROCESSOR BZ2 adENCAPSULATION ISO adENCAPSULATION XDW adWORDPROCESSOR RM, RA adMOVIE AC3 adSOUND NEF adENCAPSULATION SLDASM, SLDPRT, adVECTORGRAPHIC SLDDRW, SLDDRT XFDL, XFD adPRESENTATION PLIST adMISC ONE adWORDPROCESSOR adWORDPROCESSOR DCM adRASTERIMAGE Readers ichatsr oo3sr bzip2sr isosr olesr kpXFDLrdr onesr dcmsr E01, L01, LX01 SHS MPP PUB DBB HL7 OST EPUB DBX adENCAPSULATION adENCAPSULATION adSCHEDULE adDESKTOPPUBLSH adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION adWORDPROCESSOR adENCAPSULATION encase2sr, encasesr olesr mppsr mspubsr skypesr hl7sr pffsr epubsr dbxsr DAT DMG MLS, ML3, ML4, ML5, ML6, ML7, ML8, ML9, MLA PTX adWORDPROCESSOR adENCAPSULATION adRASTERIMAGE adWORDPROCESSOR Page 123 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number PostScript_Font_Fmt 438 Ghost_DiskImage_Fmt 439 JPEG_2000_JP2_File_ 440 Fmt Unicode_HTML_Fmt 441 CHM_Fmt 442 EMCMF_Fmt 443 MS_Access_2007_Tmpl_ 444 Fmt Jungum_Fmt 445 JBIG2_Fmt 446 EFax_Fmt 447 AD1_Fmt 448 SketchUp_Fmt 449 GWFS_Email_Fmt 450 JNT_Fmt 451 Yahoo_yChat_Fmt 452 PaperPort_MAX_File_ 453 Fmt ARJ_Fmt 454 RPMSG_Fmt 455 MAT_Fmt 456 SGY_Fmt 457 CDXA_MPEG_PS_Fmt 458 EVT_Fmt 459 EVTX_Fmt 460 MS_OutlookOLM_Fmt 461 WARC_Fmt 462 JAVACLASS_Fmt 463 VCF_Fmt 464 EDB_Fmt 465 Category 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 402 403 404 405 406 407 408 409 410 411 412 413 Description PostScript Type 1 Font Ghost Disk Image File JPEG-2000 JP2 File Format Syntax (ISO/IEC 15444-1) Unicode HTML Microsoft Compiled HTML Help Documentum EMCMF format Microsoft Access 2007 Template MIME Type application/x-font image/jp2 text/html application/x-chm Samsung Electronics Jungum Global document JBIG2 File Format eFax file AD1 Evidence file Google SketchUp GroupWise FileSurf email Windows Journal format Yahoo! Messenger chat log PaperPort MAX image file application/jungum image/jbig2 image/max ARJ (Archive by Robert Jung) file format application/arj Microsoft Outlook Restricted Permission Message application/x-microsoft-rpmsg-message MATLAB file format application/x-matlab-data SEG-Y Seismic Data format MPEG-PS container with CDXA stream video/mpeg Microsoft Windows NT Event Log Microsoft Windows Vista Event Log Microsoft Outlook for Macintosh format Web ARChive application/warc Java Class format application/x-java-class Microsoft Outlook vCard file format text/vcard Microsoft Exchange Server Database file format Extension PFB GHO, GHS JP2, JPF, J2K, JPWL, JPX, PGX HTM, HTML CHM EMCMF ACCDT File Class adFONT adENCAPSULATION adRASTERIMAGE adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adDATABASE Readers pfasr jp2000sr, kpjp2000rdr unihtmsr chmsr msgsr GUL JB2, JBIG2 EFX AD1 SKP GWFS JNT YCHAT MAX adWORDPROCESSOR adRASTERIMAGE adFAXFORMAT adENCAPSULATION adVECTORGRAPHIC adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE kpJBIG2rdr ad1sr gwfssr ARJ RPMSG MAT, FIG SGY, SEGY MPG EVT EVTX OLM WARC CLASS VCF EDB adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adMOVIE adMISC adMISC adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION multiarcsr rpmsgsr olmsr vcfsr IDOL KeyView (12.13) Page 124 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name ICS_Fmt MS_Visio_2013_Fmt Number 466 467 Category 414 415 Description Microsoft Outlook iCalendar file format Microsoft Visio 2013 MIME Type text/calendar application/vnd.visio MS_Visio_2013_Macro_ 468 415 Fmt ICHITARO_Compr_Fmt 469 417 IWWP13_Fmt 470 418 IWSS13_Fmt 471 419 IWPG13_Fmt 472 420 XZ_Fmt 473 421 Sony_WAVE64_Fmt 474 422 Conifer_WAVPACK_Fmt 475 423 Xiph_OGG_VORBIS_Fmt 476 424 MS_Visio_2013_Stencil_ 477 415 Fmt MS_Visio_2013_Stencil_ 478 415 Macro_Fmt MS_Visio_2013_ Template_Fmt 479 415 MS_Visio_2013_ 480 415 Template_Macro_Fmt Borland_Reflex_2_Fmt 481 425 PKCS_12_Fmt 482 426 B1_Fmt 483 427 ISO_IEC_MPEG_4_Fmt 484 428 RAR5_Fmt 485 429 Unigraphics_NX_Fmt 486 362 PTC_Creo_Fmt 487 430 KML_Fmt KMZ_Fmt 488 431 489 432 Microsoft Visio 2013 macro application/vnd.visio ICHITARO Compressed format Apple iWork 2013 Pages format Apple iWork 2013 Numbers format Apple iWork 2013 Keynote format application/x-js-taro XZ archive format Sony Wave64 format Conifer Wavpack format Xiph Ogg Vorbis format MS Visio 2013 stencil format application/x-xz audio/wav64 audio/x-wavpack audio/ogg application/vnd.visio MS Visio 2013 stencil Macro format application/vnd.visio MS Visio 2013 template format application/vnd.visio MS Visio 2013 template Macro format application/vnd.visio Borland Reflex 2 format PKCS #12 (p12) format B1 format ISO/IEC MPEG-4 (ISO 14496) format RAR5 Format Unigraphics (UG) NX CAD Format PTC Creo/Parametric/Elements/ProEngineer/Wildfire CAD Format Keyhole Markup Language Zipped Keyhole Markup Language application/x-pkcs12 application/x-b1 video/mp4 application/x-rar-compressed application/x-prt application/vnd.google-earth.kml+xml application/vnd.google-earth.kmz IDOL KeyView (12.13) Extension File Class ICS, VCS adENCAPSULATION VSDX, VSTX, VSSX adPRESENTATION VSDM, VSTM, VSSM JTDC IWA, PAGES IWA, NUMBERS IWA, KEY XZ W64 WV OGG VSSX adPRESENTATION adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adENCAPSULATION adSOUND adSOUND adSOUND adPRESENTATION Readers icssr ActiveX components, kpVSDXrdr kpVSDXrdr jtdsr iwwp13sr iwss13sr kpIWPG13rdr, kpIWPGrdr multiarcsr kpVSDXrdr VSSM adPRESENTATION kpVSDXrdr VSTX adPRESENTATION kpVSDXrdr VSTM adPRESENTATION kpVSDXrdr R2D P12, PFX B1 MP4 RAR PRT ASM, DRW, PRT, FRM adDATABASE adWORDPROCESSOR adENCAPSULATION adMOVIE adENCAPSULATION adVECTORGRAPHIC adVECTORGRAPHIC b1sr mpeg4sr multiarcsr kpUGrdr KML KMZ adWORDPROCESSOR adWORDPROCESSOR xmlsr unzip Page 125 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name WML_Fmt ODF_Formula_Fmt SO_Text_Fmt Number 490 491 492 Category 433 434 435 Description Wireless Markup Language ODF Formula Star Office 4,5 Writer Text MIME Type text/vnd.wap.wml application/vnd.oasis.opendocument.formula application/vnd.stardivision.writer SO_Spreadsheet_Fmt 493 436 SO_Presentation_Fmt 494 437 SO_Math_Fmt 495 438 STEP_Fmt 496 439 STL_Fmt 497 364 AppleScript_Fmt 498 440 Assembly_Fmt 499 441 C_Fmt 500 442 Csharp_Fmt 501 443 CPlusPlus_Fmt 502 444 Css_Fmt 503 445 Clojure_Fmt 504 446 CoffeeScript_Fmt 505 447 Lisp_Fmt 506 448 Dockerfile_Fmt 507 449 Eiffel_Fmt 508 450 Erlang_Fmt 509 451 Fsharp_Fmt 510 452 Fortran_Fmt 511 453 Go_Fmt 512 454 Groovy_Fmt 513 455 Haskell_Fmt 514 456 Ini_Fmt 515 457 Java_Fmt 516 458 Javascript_Fmt 517 459 Lua_Fmt 518 460 Makefile_Fmt 519 461 Star Office 4,5 Calc Spreadsheet application/vnd.stardivision.calc Star Office 4,5 Impress Presentation application/vnd.stardivision.draw Star Office 4,5 Math application/vnd.stardivision.math ISO 10303-21 STEP format 3D Systems Stereo Lithography STL ASCII format AppleScript Source Code3 text/x-applescript Assembly Code3 text/x-assembly C Source Code3 text/x-c C# Source Code3 text/x-csharp C++ Source Code3 text/x-c++ Cascading Style Sheet3 text/css Clojure Source Code3 text/x-clojure CoffeeScript Source Code3 text/x-coffeescript Common Lisp Source Code3 text/x-common-lisp Dockerfile3 text/x-dockerfile Eiffel Source Code3 text/x-eiffel Erlang Source Code3 text/x-erlang F# Source Code3 text/x-fsharp Fortran Source Code3 text/x-fortran Go Source Code3 text/x-go Groovy Source Code3 text/x-groovy Haskell Source Code3 text/x-haskell Initialization (INI) file3 text/x-ini Java Source Code3 text/x-java-source Javascript Source Code3 text/javascript Lua Source Code3 text/x-lua Makefile3 text/x-makefile Extension WML ODF SDW, SGL, VOR SDC SDD, SDA SMF APPLESCRIPT C, H CS CPP, HPP CSS CLJ, CL2 COFFEE, CAKE EL E ERL, ES FS F GO GRT, GVY HS JAVA JS LUA MAKE File Class adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adMISC adMISC adCAD adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE Readers xmlsr unzip kpsdwrdr, starwsr starcsr kpsddrdr olesr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr IDOL KeyView (12.13) Page 126 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Mathematica_Fmt ObjC_Fmt ObjCpp_Fmt ObjJ_Fmt PHP_Fmt PLSQL_Fmt Pascal_Fmt Perl_Fmt Powershell_Fmt Prolog_Fmt Puppet_Fmt Python_Fmt R_Fmt Ruby_Fmt Rust_Fmt Scala_Fmt Shell_Fmt Smalltalk_Fmt ML_Fmt Swift_Fmt Tcl_Fmt Tex_Fmt TypeScript_Fmt Verilog_Fmt YAML_Fmt Wiki_Fmt MS_Word_2007_Flat_ XML_Fmt Matroska_Fmt SVG_Fmt Shapefile_Fmt Number 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 Category 462 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 301 Description Wolfram Mathematica Source Code3 Objective-C Source Code3 Objective-C++ Source Code3 Objective-J Source Code3 PHP Source Code3 PLSQL Source Code3 Pascal Source Code3 Perl Source Code3 PowerShell Source Code3 Prolog Source Code3 Puppet Source Code3 Python Source Code3 R Source Code3 Ruby Source Code3 Rust Source Code3 Scala Source Code3 Shell Script3 Smalltalk Source Code3 Standard ML Source Code3 Swift Source Code3 Tool Command Language (Tcl) Source Code3 TeX Typesetting File3 TypeScript Source Code3 Verilog Source Code3 YAML File3 MediaWiki File3 Microsoft Word 2007 XML - Flat xml 547 489 548 490 549 491 Matroska video/audio File Scalable Vector Graphics image Shapefile MIME Type text/x-mathematica text/x-objc text/x-objectivec++ text/x-objectivej text/x-php text/x-plsql text/x-pascal text/x-perl text/x-powershell text/x-prolog text/x-puppet text/x-python text/x-rsrc text/x-ruby text/x-rust text/x-scala application/x-sh text/x-stsrc text/x-ml text/x-swift text/x-tcl application/x-tex text/x-typescript text/x-verilog text/x-yaml text/x-mediawiki text/xml video/x-matroska image/svg+xml application/x-shapefile IDOL KeyView (12.13) Extension M J PHP PASCAL PL PS1 PRO, PROLOG PP PY R RB RS SC SH ST ML SWIFT TM TS V YML XML MKV, MKA SVG SHP, SHX File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adWORDPROCESSOR adWORDPROCESSOR Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr mwxsr adMOVIE adVECTORGRAPHIC adGIS xmlsr Page 127 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Flash_Video_Fmt 550 Embedded_OpenType_ 551 Fmt Web_Open_Font_Fmt 552 OpenType_Fmt 553 MNG_Fmt 554 JNG_Fmt 555 AppleScript_Binary_Fmt 556 Maya_Binary_Fmt 557 Jupiter_Tesselation_Fmt 558 OGV_Fmt 559 OGG_Container_Fmt 560 GNU_Message_Catalog_ 561 Fmt Windows_Shortcut_Fmt 562 Apple_Typedstream_Fmt 563 XCF_Fmt 564 PaintShop_Pro_Fmt 565 SQLite_Database_Fmt 566 MySQL_Table_Fmt 567 Microsoft_Program_DB_ 568 Fmt OpenEXR_Fmt 569 XMV_Fmt 570 AMV_Fmt 571 NIFF_Fmt 572 CuBase_Fmt 573 SoundFont_Fmt 574 WebP_Fmt 575 ICC_Fmt 576 PCF_Fmt 577 Category 492 493 494 495 496 497 498 499 363 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 Description Flash video File Embedded OpenType font Web Open Font Format OpenType Font Multiple-image Network Graphics JPEG Network Graphics AppleScript Binary Source Code Autodesk Maya binary file UGS Jupiter Tesselation file Ogg Theora Video format General Ogg Container format GNU Message Catalog format Windows shortcut file Apple/NeXT typedstream data format GIMP XCF image PaintShop Pro image SQLite database format MySQL table definition file Microsoft Program Database format OpenEXR image format 4X Movie File AMV video file Notation Interchange File Format Steinberg Nuendo/CuBase file SoundFont file WebP image International Color Consortium files X11 Portable Compiled Font file MIME Type video/x-flv application/vnd.ms-fontobject font/woff font/otf video/x-mng image/x-jng video/ogg application/ogg application/x-ms-shortcut image/x-xcf application/x-sqlite3 image/webp application/vnd.iccprofile application/x-font-pcf IDOL KeyView (12.13) Extension FLV EOT File Class adMOVIE adFONT WOFF, WOFF2 OTF MNG JNG SCPT MB JT OGV OGG MO adFONT adFONT adANIMATION adRASTERIMAGE adSOURCECODE adCAD adCAD adMOVIE adMISC adMISC LNK XCF PSP, PSPIMAGE QHC FRM PDB adMISC adMISC adRASTERIMAGE adRASTERIMAGE adDATABASE adDATABASE adDATABASE EXR 4XM AMV NIF NPR WEBP ICC, ICM PCF adRASTERIMAGE adMOVIE adMOVIE adSOUND adSOUND adSOUND adRASTERIMAGE adMISC adFONT Readers kpWEBPrdr Page 128 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number WebM_Fmt 578 AMFF_Fmt 579 ANBM_Fmt 580 ANIM_Fmt 581 DEEP_Fmt 582 FAXX_Fmt 583 ICON_Fmt 584 ILBM_Fmt 585 LWOB_Fmt 586 MAUD_Fmt 587 PBM_Fmt 588 TDDD_Fmt 589 DjVu_Fmt 590 InDesign_Fmt 591 Calamus_Fmt 592 Adaptive_MultiRate_Fmt 593 FLAC_Fmt 594 Ogg_FLAC_Fmt 595 SAS7BDAT_Fmt 596 Design_Web_Format_ 597 Fmt Adobe_Flash_Audio_ 598 Book_Fmt Adobe_Flash_Audio_Fmt 599 Adobe_Flash_Protected_ 600 Video_Fmt Adobe_Flash_Video_Fmt 601 Audible_Audiobook_Fmt 602 Canon_Camera_Fmt 603 Canon_Raw_Fmt 604 Casio_Camera_Fmt 605 Category 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 Description WebM video file Amiga Metafile IFF Animated Bitmap IFF Amiga animated raster graphics format IFF-DEEP TVPaint image IFF-FAXX Facsimile image IFF Glow Icon image Interleaved BitMap image LightWave Object format IFF-MAUD MacroSystem audio format IFF Planar BitMap IFF TDDD and Imagine Object animation format AT&T DjVu format Adobe InDesign document Calamus Desktop Publishing Adaptive Multi-Rate audio format Free Lossless Audio Codec format Ogg Container FLAC audio format SAS7BDAT database storage format Autodesk Design Web Format Adobe Flash Player audio book Adobe Flash Player audio Adobe Flash Player protected video Adobe Flash Player video Audible Enhanced Audiobook Canon Digital Camera image Canon Raw image Casio Digital Camera image MIME Type video/webm image/vnd.djvu application/x-indesign audio/amr audio/flac model/vnd.dwf audio/mp4 audio/mp4 video/mp4 video/x-f4v audio/vnd.audible.aax IDOL KeyView (12.13) Extension WEBM AMF DEEP IFF LWOB TDD DJVU INDD AMR FLAC OGG SAS7BDAT DWF F4B F4A F4P F4V AAX CR3 File Class adMOVIE adVECTORGRAPHIC adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adMISC adSOUND adRASTERIMAGE adRASTERIMAGE adWORDPROCESSOR adDESKTOPPUBLSH adDESKTOPPUBLSH adSOUND adSOUND adSOUND adDATABASE adCAD Readers sassr adSOUND mpeg4sr adSOUND adMOVIE mpeg4sr mpeg4sr adMOVIE adSOUND adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE mpeg4sr mpeg4sr Page 129 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Convergent_Design_Fmt 606 DMB_MAF_Audio_Fmt 607 DMB_MAF_Video_Fmt 608 DMP_Content_Fmt 609 DVB_Fmt 610 Dirac_Wavelet_ 611 Compression_Fmt HEICS_Image_ 612 Sequence_Fmt HEIC_Image_Fmt 613 HEIFS_Image_ 614 Sequence_Fmt HEIF_Image_Fmt 615 ISMACryp_Fmt 616 ISO_3GPP2_Fmt 617 ISO_3GPP_Fmt 618 ISO_JPEG2000_JP2_ 619 Fmt ISO_JPEG2000_JPM_ 620 Fmt ISO_JPEG2000_JPX_ 621 Fmt ISO_QuickTime_Fmt 622 KDDI_Video_Fmt 623 MAF_Photo_Player_Fmt 624 MPEG4_AVC_Fmt 625 MPEG4_M4A_Fmt 626 MPEG4_M4B_Fmt 627 MPEG4_M4P_Fmt 628 MPEG4_M4V_Fmt 629 MPEG4_Sony_PSP_Fmt 630 MPEG_21_Fmt 631 Mobile_QuickTime_Fmt 632 Category 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 Description Convergent Design file DMB MAF audio DMB MAF video Digital Media Project Content Format Digital Video Broadcast format ISO-BMFF Dirac Wavelet compression High Efficiency Image Format HEVC image sequence High Efficiency Image Format HEVC image High Efficiency Image Format image sequence High Efficiency Image Format image ISMACryp 2.0 Encrypted format 3GPP2 video file 3GPP video file ISO-BMFF JPEG 2000 image ISO-BMFF JPEG 2000 compound image ISO-BMFF JPEG 2000 with extensions Apple ISO-BMFF QuickTime video KDDI Video file MAF Photo Player ISO-BMFF MPEG-4 with AVC extension Apple MPEG-4 Part 14 audio Apple MPEG-4 Part 14 audio book Apple MPEG-4 Part 14 protected audio Apple MPEG-4 Part 14 video Sony PSP MPEG-4 MPEG-21 Mobile QuickTime video MIME Type video/vnd.dvb.file image/heic-sequence image/heic image/heif-sequence image/heif video/3gpp2 video/3gpp image/jp2 image/jpm image/jpx video/quicktime video/3gpp2 video/mp4 audio/x-m4a audio/mp4 audio/mp4 video/x-m4v audio/mp4 audio/mp4 video/quicktime IDOL KeyView (12.13) Extension DVB HEICS HEIC HEIFS HEIF 3G2 3GP JP2 JPM JPX QT, MOV M4A M4B M4P M4V MP4 MQV File Class adRASTERIMAGE adSOUND adMOVIE adMISC adMOVIE adMISC adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adENCAPSULATION adMOVIE adMOVIE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adMOVIE adMOVIE adMISC adMOVIE adSOUND adSOUND adSOUND adMOVIE adSOUND adMISC adMOVIE Readers mpeg4sr mpeg4sr jp2000sr, kpjp2000rdr jp2000sr, kpjp2000rdr jp2000sr, kpjp2000rdr MCI mpeg4sr mpeg4sr mpeg4sr mpeg4sr mpeg4sr mpeg4sr mpeg4sr mpeg4sr MCI Page 130 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Motion_JPEG_2000_Fmt 633 Category 574 Description Motion JPEG 2000 NTT_MPEG4_Fmt 634 575 Nero_MPEG4_AVC_ 635 576 Profile Nero_MPEG4_Audio_ 636 577 Fmt Nero_MPEG4_Profile 637 578 OMA_DRM_Fmt 638 579 Panasonic_Camera_Fmt 639 580 Ross_Video_Fmt 640 581 SDA_Video_Fmt 641 582 Samsung_Stereoscopic_ 642 583 Fmt Sony_XAVC_Fmt 643 584 JPEG_2000_PGX_Fmt 644 585 NTT MPEG-4 Nero MPEG-4 profile with AVC extension Nero AAC audio Nero MPEG-4 profile OMA DRM (ISOBMFF) Format Panasonic Digital Camera image Ross video SDA SD Memory Card video Samsung stereoscopic stream Sony XAVC video JPEG 2000 PGX Verification Model image Apple_Desktop_ 645 586 Services_Store_Fmt Core_Audio_Fmt 646 587 VICAR_Fmt 647 588 Apple Desktop Services Store file Apple Core Audio Format VICAR image format FITS_Fmt 648 589 DIF_Fmt 649 590 MPEG_Transport_ Stream_Fmt 650 591 MPEG_Sequence_Fmt 651 592 Ogg_OGM_Fmt 652 593 Ogg_Speex_Fmt 653 594 Ogg_Opus_Fmt 654 595 Musepack_Audio_Fmt 655 596 ART_Image_Fmt 656 597 Vivo_Fmt 657 598 Flexible Image Transport System FITS image Digital Interface Format (DIF) DV video MPEG Transport Stream data MPEG Sequence format Ogg OGM video format Ogg Speex audio format Ogg Opus audio format Musepack audio format ART image format Vivo audio-video format MIME Type video/mj2 video/mp4 video/mp4 audio/mp4 video/mp4 audio/x-caf image/fits video/MP2T video/mpeg video/ogg audio/ogg audio/ogg audio/x-musepack image/x-jg video/vnd.vivo IDOL KeyView (12.13) Extension MJ2, MJP2 PGX DS_Store CAF IMG, MAP, VIC, VICAR FIT DV TS, M2T, M2TS, MTS OGM SPX OGG MPC ART VIV File Class adMOVIE adMOVIE adMOVIE adSOUND adMOVIE adMISC adRASTERIMAGE adMOVIE adMOVIE adMISC adMOVIE adRASTERIMAGE adMISC adSOUND adRASTERIMAGE adRASTERIMAGE adMOVIE adMISC adMISC adMOVIE adSOUND adSOUND adSOUND adRASTERIMAGE adMOVIE Readers jp2000sr, kpjp2000rdr mpeg4sr mpeg4sr mpeg4sr jp2000sr, kpjp2000rdr Page 131 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name QCP_Fmt CSP_Codec_Fmt TwinVQ_Fmt Interplay_MVE_Fmt IRIX_Moviemaker_Fmt Sega_FILM_Fmt SMAF_Fmt NIST_SPHERE_Fmt Chinese_AVS_Fmt VQA_Fmt YAFA_Fmt Origin_MVE_Fmt BBC_Dirac_Fmt Maya_ASCII_Fmt RenderMan_Fmt NOFF_Binary_Fmt VTK_ASCII_Fmt VTK_Binary_Fmt Wolfram_CDF_Fmt Wolfram_Notebook_Fmt HDF4_Fmt HDF5_Fmt ARMovie_Fmt Windows_TV_DVR_Fmt InstallShield_Z_Fmt MS_DirectDraw_ Surface_Fmt Bink_Fmt LZMA_Fmt Number 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 Category 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 Description Qualcomm QCP audio Creative Signal Processor codec NTT TwinVQ audio format Interplay MVE video format IRIX Silicon Graphics moviemaker video file Sega FILM video format Synthetic music Mobile Application Format NIST SPeech HEader REsources format Chinese AVS video format Westwood Studios Vector Quantized Animation video file Wildfire YAFA animation Origin Wing Commander III MVE movie format BBC Dirac video format Autodesk Maya ASCII file format Pixar RenderMan Interface Bytestream file NOFF 3D Object File Format Visualization Toolkit VTK ASCII format Visualization Toolkit VTK Binary format Wolfram Mathematica Computable Document Format Wolfram Mathematica Notebook Format Hierarchical Data Format HDF4 Hierarchical Data Format HDF5 Acorn RISC ARMovie video format Windows Television DVR format InstallShield Z archive format Microsoft DirectDraw Surface container format Bink audio-video container format LZMA compressed data format MIME Type audio/qcelp video/x-sgi-movie application/vnd.smaf video/x-dirac application/cdf application/x-hdf application/x-hdf application/x-compress application/x-lzma IDOL KeyView (12.13) Extension QCP CSP VQF MVE MV, MOVIE CPK, CAK MMF NIST VQA YAFA MVE DRC MA RIB NOFF VTK VTK CDF NB HDF, H4 HDF, H5 RPL WTV Z DDS BIK, BK2 LZMA File Class adSOUND adMISC adSOUND adMOVIE adMOVIE adMOVIE adSOUND adSOUND adMOVIE adANIMATION Readers adANIMATION adMOVIE adMOVIE adCAD adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adMISC adMISC adMISC adMISC adMOVIE adMOVIE adENCAPSULATION adENCAPSULATION adMOVIE adENCAPSULATION Page 132 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number True_Audio_Fmt 686 Keepass_Fmt 687 RPM_Fmt 688 Printer_Font_Metrics_ 689 Fmt Adobe_Font_Metrics_Fmt 690 Printer_Font_ASCII_Fmt 691 Netware_Loadable_ 692 Module_Fmt TCPdump_pcap_Fmt 693 Multiple_Master_Font_ 694 Fmt TrueType_Font_ 695 Collection_Fmt Shapefile_Spatial_Index_ 696 Fmt Java_Key_Store_Fmt 697 Java_JCE_Key_Store_ 698 Fmt Quark_Xpress_Intel_Fmt 699 Windows_Imaging_Fmt 700 VMware_Virtual_Disk_ 701 Fmt XPConnect_Typelib_Fmt 702 MS_DOS_Compression_ 703 Fmt DLS_Fmt 704 MS_Windows_Registry_ 705 Fmt Microsoft_Help_2_Fmt 706 Qt_Translation_Fmt 707 PEM_SSL_Certificate_ 708 Fmt PostScript_Printer_ 709 Description_Fmt Category 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 Description True Audio format Keepass Password file RPM Package Manager file Adobe Printer Font Metrics format Adobe Font Metrics ASCII format Adobe Printer Font ASCII format Netware Loadable Module format TCPdump packet stream capture savefile format Adobe Multiple master font format TrueType font collection format Shapefile binary spatial index format Java Key Store format Java JCE Key Store format QuarkXPress Intel format Microsoft Windows Imaging Format WIM VMware Virtual Disk Format 5.0 XPConnect Typelib Format Microsoft MS-DOS installation compression (SZDD, KWAJ) DLS Downloadable Sounds format Microsoft Windows Registry format Microsoft Help 2.0 format Qt binary translation file format PEM-encoded SSL certificate Adobe PostScript Printer Description file MIME Type audio/x-tta application/x-rpm application/x-font-printer-metric application/x-font-adobe-metric application/x-font-type1 application/vnd.tcpdump.pcap application/x-font-ttf application/x-shapefile application/x-java-keystore application/x-java-jce-keystore application/vnd.quark.quarkxpress application/x-vmdk application/x-ms-compress application/x-ms-reader application/pkix-cert application/vnd.cups-ppd IDOL KeyView (12.13) Extension TTA KDB, KDBX RPM PFM File Class adSOUND adMISC adENCAPSULATION adFONT Readers AFM PFA NLM adFONT adFONT adMISC afmsr pfasr PCAP MMM adMISC adFONT TTC adFONT SBX, SBN adGIS KS adMISC adMISC QXB WIM VMDK adDESKTOPPUBLSH adENCAPSULATION adMISC XPT adMISC EX_ adENCAPSULATION DLS adSOUND adMISC HXD, HXW, HXH QM CRT, PEM, CER, KEY PPD adENCAPSULATION adMISC adENCAPSULATION adMISC Page 133 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Speedo_Font_Fmt 710 InstallShield_Cabinet_ 711 Fmt InstallShield_Uninstall_ 712 Fmt MS_OEDBX_Folder_Fmt 713 LabVIEW_Fmt 714 SAP_Archive_SAR_Fmt 715 Netscape_Address_ 716 Book_Fmt Universal_3D_Fmt 717 Open_Inventor_ASCII_ 718 Fmt Open_Inventor_Binary_ 719 Fmt X_Window_Dump_Fmt 720 Git_Packfile_Fmt 721 Xara_Xar_Fmt 722 Internet_Archive_ARC_ 723 Fmt Applix_Builder_Fmt 724 Applix_Bitmap_Fmt 725 PEM_RSA_Private_Key_ 726 Fmt MIFF_Fmt 727 Subversion_Dump_Fmt 728 Virtual_Hard_Disk_Fmt 729 Direct_Access_Archive_ 730 Fmt Debian_Binary_Fmt 731 XUL_Fastload_Fmt 732 Nastran_OP2_Fmt 733 Binary_Logging_Fmt 734 Category 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 Description Speedo Font format InstallShield Cabinet Archive format InstallShield Uninstall format Outlook Express DBX folder database format National Instruments LabVIEW file format SAP compression archive SAR format Netscape Address Book format Universal 3D file format Open Inventor ASCII format Open Inventor Binary format X Window Dump image Git Packfile format Xara X Xar image format Internet Archive ARC format Applix Builder format Applix Bitmap image format PEM-encoded RSA private key Magick Image File Format Subversion Dump format Microsoft Virtual Hard Disk format PowerISO Direct Access Archive format Debian binary package format Mozilla XUL Fastload format Nastran OP2 format CAD Binary Logging Format MIME Type image/x-xwindowdump application/vnd.xara application/x-ia-arc application/x-vhd application/x-debian-package IDOL KeyView (12.13) Extension SPD CAB, HDR ISU DBX VI SAR NAB U3D IV IV XWD PACK XAR ARC AB IM PEM MIF, MIFF VHD DAA DEB MFL OP2 BLF File Class adFONT adENCAPSULATION Readers adENCAPSULATION adENCAPSULATION adMISC adENCAPSULATION adMISC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adRASTERIMAGE adWORDPROCESSOR adVECTORGRAPHIC adENCAPSULATION gitpacksr adMISC adRASTERIMAGE adENCAPSULATION adRASTERIMAGE adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adMISC adCAD adCAD Page 134 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Measurement_Data_Fmt 735 Abaqus_ODB_Fmt 736 Open_Diagnostic_Data_ 737 Exchange_Fmt Vector_ASCII_Fmt 738 LSDYNA_State_ 739 Database_Fmt LSDYNA_Binary_Output_ 740 Fmt MS_Power_BI_Fmt 741 Tableau_Workbook_Fmt 742 Tableau_Packaged_ 743 Workbook_Fmt Tableau_Extract_Fmt 744 Tableau_Data_Source_ 745 Fmt Tableau_Packaged_ 746 Data_Source_Fmt Tableau_Preferences_ 747 Fmt Tableau_Map_Source_ 748 Fmt ABAP_Fmt 749 AMPL_Fmt 750 APL_Fmt 751 ASN1_Fmt 752 ATS_Fmt 753 Agda_Fmt 754 Alloy_Fmt 755 Apex_Fmt 756 Arduino_Fmt 757 AsciiDoc_Fmt 758 AspectJ_Fmt 759 Category 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 Description CAD Measurement Data Format Abaqus ODB Format Vector Open Diagnostic Data Exchange format Vector CAD ASCII ASC format LS-DYNA State Database format LS-DYNA binary output (binout) format Microsoft Power BI Desktop format Tableau Workbook format Tableau Packaged Workbook format Tableau Extract format Tableau Data Source format Tableau Packaged Data Source format Tableau Preferences format Tableau Map Source format ABAP Source Code4 AMPL Source Code4 APL Source Code4 ASN.1 Source Code4 ATS Source Code4 Agda Source Code4 Alloy Source Code4 Apex Source Code4 Arduino Source Code4 AsciiDoc Source Code4 AspectJ Source Code4 MIME Type text/x-abap text/x-agda text/x-alloy text/x-arduino text/x-asciidoc text/x-aspectj IDOL KeyView (12.13) Extension MDF ODB ODX ASC PBIX TWB TWBX TDE TDS TDSX TPS TMS ABAP AMPL APL ASN AGDA ALS CLS INO ASC AJ File Class adCAD adCAD adCAD adCAD adCAD adCAD adANALYTICS adANALYTICS adANALYTICS adANALYTICS adANALYTICS adANALYTICS adANALYTICS adANALYTICS adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE Readers xmlsr pbixsr xmlsr unzip xmlsr unzip xmlsr xmlsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr Page 135 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Awk_Fmt BlitzMax_Fmt Bluespec_Fmt Brainfuck_Fmt Brightscript_Fmt CLIPS_Fmt CMake_Fmt COBOL_Fmt Number 760 761 762 763 764 765 766 767 Category 701 702 703 704 705 706 707 708 Description Awk Source Code4 BlitzMax Source Code4 Bluespec Source Code4 Brainfuck Source Code4 Brightscript Source Code4 CLIPS Source Code4 CMake Source Code4 COBOL Source Code4 CWeb_Fmt 768 709 CartoCSS_Fmt 769 710 Ceylon_Fmt 770 711 Chapel_Fmt 771 712 Clarion_Fmt 772 713 Clean_Fmt 773 714 Component_Pascal_Fmt 774 715 Cool_Fmt 775 716 Coq_Fmt 776 717 Creole_Fmt 777 718 Crystal_Fmt 778 719 Csound_Fmt 779 720 Csound_Document_Fmt 780 721 Cuda_Fmt 781 722 D_Fmt 782 723 DIGITAL_Command_ 783 724 Language_Fmt DTrace_Fmt 784 725 Dart_Fmt 785 726 E_Fmt 786 727 ECL_Fmt 787 728 Elm_Fmt 788 729 CWeb Source Code4 CartoCSS Source Code4 Ceylon Source Code4 Chapel Source Code4 Clarion Source Code4 Clean Source Code4 Component Pascal Source Code4 Cool Source Code4 Coq Source Code4 Creole Source Code4 Crystal Source Code4 Csound Source Code4 Csound Document Source Code4 Cuda Source Code4 D Source Code4 DIGITAL Command Language Source Code4 DTrace Source Code4 Dart Source Code4 E Source Code4 ECL Source Code4 Elm Source Code4 MIME Type text/x-awk text/x-bmx text/x-brainfuck text/x-cmake text/x-cobol text/x-ceylon text/x-component-pascal text/x-coq text/x-cuda text/x-d text/x-dart application/x-ecl text/x-elm IDOL KeyView (12.13) Extension AWK BMX BSV B, BF BRS CLP CMAKE CBL, CCP, COB, CPY W MSS CEYLON CHPL CLW DCL, ICL CP CL V CREOLE CR ORC CSD CU DCL, ICL COM File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE D DART E ECL ELM adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr Page 136 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Emacs_Lisp_Fmt EmberScript_Fmt Fantom_Fmt Forth_Fmt FreeMarker_Fmt Frege_Fmt G_code_Fmt GAMS_Fmt GAP_Fmt GDScript_Fmt GLSL_Fmt Game_Maker_ Language_Fmt Gnuplot_Fmt Golo_Fmt Gosu_Fmt Gradle_Fmt GraphQL_Fmt Graphviz_DOT_Fmt HLSL_Fmt Hack_Fmt Haml_Fmt Handlebars_Fmt Hy_Fmt IDL_Fmt IGOR_Pro_Fmt Idris_Fmt Inform_7_Fmt Ioke_Fmt Isabelle_Fmt J_Fmt Number 789 790 791 792 793 794 795 796 797 798 799 800 Category 730 731 732 733 734 735 736 737 738 739 740 741 Description Emacs Lisp Source Code4 EmberScript Source Code4 Fantom Source Code4 Forth Source Code4 FreeMarker Source Code4 Frege Source Code4 G-code Source Code4 GAMS Source Code4 GAP Source Code4 GDScript Source Code4 GLSL Source Code4 Game Maker Language Source Code4 801 742 802 743 803 744 804 745 805 746 806 747 807 748 808 749 809 750 810 751 811 752 812 753 813 754 814 755 815 756 816 757 817 758 818 759 Gnuplot Source Code4 Golo Source Code4 Gosu Source Code4 Gradle Source Code4 GraphQL Source Code4 Graphviz (DOT) Source Code4 HLSL Source Code4 Hack Source Code4 Haml Source Code4 Handlebars Source Code4 Hy Source Code4 IDL Source Code4 IGOR Pro Source Code4 Idris Source Code4 Inform 7 Source Code4 Ioke Source Code4 Isabelle Source Code4 J Source Code4 MIME Type text/x-emacs-lisp application/x-fantom text/x-forth text/x-glslsrc text/x-gnuplot text/x-gosu text/x-haml text/x-hy text/x-idl text/ipf text/x-idris text/x-iokesrc text/x-isabelle text/x-j IDOL KeyView (12.13) Extension EL EM FAN FOR, FORTH FTL FR G GMS GD GLSL GML GNU, GP GOLO GS GRADLE GRAPHQL DOT HLSL HAML HBS HY PRO IPF IDR I7X IK IJS File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr Page 137 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name JSONiq_Fmt JSX_Fmt Jasmin_Fmt Jolie_Fmt Julia_Fmt KiCad_Layout_Fmt KiCad_Schematic_Fmt Kotlin_Fmt LFE_Fmt LOLCODE_Fmt Lasso_Fmt Limbo_Fmt LiveScript_Fmt M_Fmt MAXScript_Fmt Markdown_Fmt Matlab_Fmt Max_Code_Fmt Mercury_Fmt Modelica_Fmt Modula_2_Fmt Monkey_Fmt Moocode_Fmt NL_Fmt NSIS_Fmt NetLogo_Fmt NewLisp_Fmt Nginx_Fmt Nix_Fmt Nu_Fmt Number 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 Category 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 463 776 777 778 779 780 781 782 783 784 785 786 787 788 Description JSONiq Source Code4 JSX Source Code4 Jasmin Source Code4 Jolie Source Code4 Julia Source Code4 KiCad Layout Source Code4 KiCad Schematic Source Code4 Kotlin Source Code4 LFE Source Code4 LOLCODE Source Code4 Lasso Source Code4 Limbo Source Code4 LiveScript Source Code4 M Source Code4 MAXScript Source Code4 Markdown Source Code4 Matlab Source Code4 Max Source Code4 Mercury Source Code4 Modelica Source Code4 Modula-2 Source Code4 Monkey Source Code4 Moocode Source Code4 NL Source Code4 NSIS Source Code4 NetLogo Source Code4 NewLisp Source Code4 Nginx Source Code4 Nix Source Code4 Nu Source Code4 MIME Type text/x-julia text/x-kotlin text/x-lasso text/limbo text/x-livescript text/x-matlab text/x-modelica text/x-modula2 text/x-monkey text/x-moocode text/x-nsis text/x-newlisp text/x-nginx-conf text/x-nix IDOL KeyView (12.13) Extension JQ JSX J JL SCH KT LFE LOL LAS, LASSO LS M MS MD M MXT MO MOD MONKEY MOO NL NSI NLOGO NL VHOST NIX NU File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr Page 138 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name OCaml_Fmt OpenCL_Fmt OpenEdge_ABL_Fmt OpenSCAD_Fmt Ox_Fmt Oxygene_Fmt Oz_Fmt PAWN_Fmt PLpgSQL_Fmt Pan_Fmt Parrot_Assembly_Fmt PicoLisp_Fmt Pike_Fmt Pony_Fmt Processing_Fmt PureBasic_Fmt QMake_Fmt RAML_Fmt RDoc_Fmt REXX_Fmt Racket_Fmt Ragel_Fmt Rascal_Fmt Rebol_Fmt Red_Fmt RenPy_Fmt RenderScript_Fmt Ring_Fmt RobotFramework_Fmt SAS_Fmt Number 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 Category 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 Description OCaml Source Code4 OpenCL Source Code4 OpenEdge ABL Source Code4 OpenSCAD Source Code4 Ox Source Code4 Oxygene Source Code4 Oz Source Code4 PAWN Source Code4 PLpgSQL Source Code4 Pan Source Code4 Parrot Assembly Source Code4 PicoLisp Source Code4 Pike Source Code4 Pony Source Code4 Processing Source Code4 PureBasic Source Code4 QMake File4 RAML Source Code4 RDoc Source Code4 REXX Source Code4 Racket Source Code4 Ragel Source Code4 Rascal Source Code4 Rebol Source Code4 Red Source Code4 Ren'Py Source Code4 RenderScript Source Code4 Ring Source Code4 RobotFramework Source Code4 SAS Source Code4 MIME Type text/x-ocaml text/x-openedge text/x-pawn text/x-plpgsql text/x-pike text/x-rexx text/x-racket text/x-rebol text/x-red text/x-robotframework IDOL KeyView (12.13) Extension CL SCAD OX OXYGENE OZ PWN PLSQL PAN PASM PIKE PONY PDE PB RAML RDOC REXX RSC REB, REBOL RED RPY RS RING ROBOT SAS File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr Page 139 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name SPARQL_Fmt SQL_Fmt SQLPL_Fmt SaltStack_Fmt Scheme_Fmt Scilab_Fmt Squirrel_Fmt Stan_Fmt Stata_Fmt Stylus_Fmt SuperCollider_Fmt SystemVerilog_Fmt TXL_Fmt Turing_Fmt Turtle_Fmt UrWeb_Fmt Vim_script_Fmt Visual_Basic_Fmt WebAssembly_Fmt WebIDL_Fmt X10_Fmt XQuery_Fmt Xojo_Fmt Xtend_Fmt YANG_Fmt Zephir_Fmt eC_Fmt reStructuredText_Fmt xBase_Fmt Windows_Installer_Fmt Number 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 Category 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 Description SPARQL format4 SQL format4 SQLPL Source Code4 SaltStack Source Code4 Scheme Source Code4 Scilab Source Code4 Squirrel Source Code4 Stan Source Code4 Stata Source Code4 Stylus Source Code4 SuperCollider Source Code4 SystemVerilog Source Code4 TXL Source Code4 Turing Source Code4 Turtle Source Code4 UrWeb Source Code4 Vim script File4 Visual Basic Source Code4 WebAssembly Source Code4 WebIDL Source Code4 X10 Source Code4 XQuery Source Code4 Xojo Source Code4 Xtend Source Code4 YANG Source Code4 Zephir Source Code4 eC Source Code4 reStructuredText Source Code4 xBase Source Code4 MSI Windows Installer format IDOL KeyView (12.13) MIME Type application/sparql-query text/x-sql text/x-scheme text/scilab text/supercollider text/x-systemverilog text/turtle text/x-vim text/x-vbasic text/x-x10 text/xquery text/x-xtend text/x-ecsrc text/x-rst application/x-ole-storage Extension SLS SCI NUT STAN STYL SC SV TXL T TTL UR, URS VIM VB WAT WEBIDL X10 XQM XTEND YANG ZEP EC MSI File Class adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adSOURCECODE adENCAPSULATION Readers afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr afsr olesr Page 140 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Autodesk_3ds_Max_Fmt 909 PhotoDraw_Mix_Fmt 910 Softimage_SCN_Fmt 911 Parasolid_XT_Fmt 912 Parasolid_XB_Fmt 913 IGES_Fmt 914 ACE_Archive_Fmt 915 Grasshopper_GHX_Fmt 916 MS_FrontPage_Macro_ 917 Fmt MS_AtWork_Fax_Fmt 918 MS_Image_Composer_ 919 Fmt MS_Visual_InterDev_Fmt 920 Macromedia_Flash_FLA_ 921 OLE_Fmt Corel_Draw_X4_Fmt 922 Ogg_Daala_Fmt 923 Ogg_BBC_Dirac_Fmt 924 PKCS_7_Fmt 925 Time_Stamped_Data_ 926 Fmt Sereal_Fmt 927 Associated_Signature_ 928 Simple_Fmt Associated_Signature_ 929 Extended_Fmt iBooks_Fmt 930 PDF_Forms_Data_Fmt 931 PDF_XML_Forms_Data_ 932 Fmt AxCrypt_Fmt 933 Unix_Archive_Fmt 934 Category 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 Description Autodesk 3ds Max format PhotoDraw MIX image Softimage Scene SCN format Parasolid ascii XT format Parasolid binary XB format Initial Graphics Exchange Specification format ACE archive format Grasshopper GHX format Microsoft FrontPage macro file format MIME Type image/vnd.mix model/iges application/x-ace-compressed Microsoft AtWork Fax format Microsoft Image Composer format Microsoft Visual InterDev web project items file Macromedia Flash FLA Project File OLE format CorelDRAW version X4 onwards Ogg Daala video format Ogg BBC Dirac video format PKCS #7 cryptographic format Time-stamped data format application/x-vnd.corel.zcf.draw.document+zip video/daala video/x-dirac application/pkcs7-signature application/timestamped-data Sereal data serialization format Associated Signature Container Simple format application/sereal application/vnd.etsi.asic-s+zip Associated Signature Container Extended format application/vnd.etsi.asic-e+zip Apple iBooks format PDF Forms Data Format PDF XML Forms Data Format application/x-ibooks+zip application/vnd.fdf application/vnd.adobe.xfdf AxCrypt encrypted document Unix Archive ar format application/x-axcrypt application/x-archive Extension MAX MIX SCN X_T X_B IGS ACE GHX FPM AWD MIC WDM FLA CDRX OGV OGV P7S TSD SRL ASICS ASICE IBOOKS FDF XFDF AXX AR File Class adCAD adRASTERIMAGE adCAD adCAD adCAD adCAD adENCAPSULATION adCAD adWORDPROCESSOR Readers olesr olesr xmlsr adFAXFORMAT adRASTERIMAGE olesr adSWDEV adWORDPROCESSOR adVECTORGRAPHIC adMOVIE adMOVIE adENCAPSULATION adENCAPSULATION pkcs7sr adMISC adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR epubsr xmlsr adENCAPSULATION adENCAPSULATION IDOL KeyView (12.13) Page 141 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Berkeley_Btree_ 935 Database_Fmt Berkeley_Hash_ 936 Database_Fmt Berkeley_Log_Database_ 937 Fmt Berkeley_Queue_ 938 Database_Fmt BitTorrent_Fmt 939 Chrome_Extension_Fmt 940 Dalvik_Executable_Fmt 941 Foxmail_Fmt 942 GRIB_Fmt 943 Zstandard_Fmt 944 LZ4_Fmt 945 MS_Money_Fmt 946 NetCDF_Fmt 947 SAS6_Data_Fmt 948 SAS_Transport_Fmt 949 Snappy_Framed_Fmt 950 Stata_Data_Fmt 951 SPSS_SAV_Fmt 952 Zoo_Archive_Fmt 953 CDX_Fmt 954 CDXML_Fmt 955 BPG_Fmt 956 Apple_Icon_Fmt 957 NITF_Fmt 958 ERDAS_Imagine_Fmt 959 MS_Office_Temporary_ 960 Owner_Fmt Category 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 Description Berkeley DB btree database format Berkeley DB hash database format Berkeley DB log database format Berkeley DB queue database format BitTorrent file format Google Chrome Extension format Dalvik Executable dex format Foxmail email format General Regularly-distributed Information in Binary form GRIB format Zstandard compression format LZ4 compressed file Microsoft Money format Network Common Data Form NetCDF format SAS 6 Data storage format SAS Transport File XPORT format Snappy Framed compression format Stata Data Format SPSS Statistics Data File Format Zoo Compressed Archive Format ChemDraw CDX format ChemDraw CDXML format Better Portable Graphics BPG format Apple Icon image format National Imagery Transmission Format NITF image ERDAS Imagine image format Microsoft Office temporary owner file MIME Type application/x-berkeley-db application/x-berkeley-db application/x-berkeley-db application/x-berkeley-db application/x-bittorrent application/x-chrome-package application/x-dex application/x-foxmail application/x-grib application/zstd application/x-lz4 application/x-msmoney application/x-netcdf application/x-sas-data-v6 application/x-sas-xport application/x-snappy-framed application/x-stata-dta application/x-zoo chemical/x-cdx application/vnd.chemdraw+xml image/x-bpg image/icns image/nitf application/x-erdas-hfa application/x-ms-owner IDOL KeyView (12.13) Extension DB DB TORRENT CRX DEX BOX GRB, GRIB2 ZSTD LZ4 MNY NC SD2 XPT, XPORT SZ DTA SAV ZOO CDX CDXML BPG ICNS NTF, NITF HFA, RRD, AUX File Class adDATABASE Readers adDATABASE adDATABASE adDATABASE adMISC adENCAPSULATION adEXECUTABLE adWORDPROCESSOR adSCIENTIFIC adENCAPSULATION adENCAPSULATION adSPREADSHEET adMISC adDATABASE adDATABASE adENCAPSULATION adDATABASE adDATABASE adENCAPSULATION adSCIENTIFIC adSCIENTIFIC adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE zstdsr xmlsr adRASTERIMAGE adMISC Page 142 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name EAC3_Audio_Fmt COFF_Relocatable_Fmt Number 961 962 COFF_Executable_Fmt 963 COFF_Dynamic_Lib_Fmt 964 ELF_Core_Fmt 965 Purify_Fmt 966 Kryptel_Fmt 967 Windows_Core_Dump_ 968 Fmt Qt_Prerendered_Font_ 969 Fmt AIX_Relocatable_Fmt 970 AIX_Executable_Fmt 971 AIX_Dynamic_Lib_Fmt 972 HPUX_Relocatable_Fmt 973 HPUX_Executable_Fmt 974 HPUX_Dynamic_Lib_Fmt 975 XML_EBCDIC_Fmt 976 MPEG_JVT_H264_Fmt 977 Material_Exchange_Fmt 978 MS_Agent_Character_ 979 Fmt Quicken_Fmt 980 MS_Outlook_Address_ 981 Fmt MS_Answer_Wizard_Fmt 982 ADX_Fmt 983 System_Deployment_ 984 Image_Fmt Free_Lossless_Image_ 985 Fmt Category 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 Description Enhanced-AC3 (EAC3) Audio File format Common Object File Format (COFF) relocatable object Common Object File Format (COFF) executable Common Object File Format (COFF) dynamic library ELF Core file Rational Purify data file Kryptel encrypted file Windows heap or mini core dump file MIME Type audio/eac3 application/x-object-file application/x-executable-file application/x-library-file application/x-coredump application/x-dmp Qt Prerendered Font format AIX/RISC COFF relocatable object AIX/RISC COFF executable AIX/RISC COFF dynamic library HPUX/PA-RISC COFF relocatable object HPUX/PA-RISC COFF executable HPUX/PA-RISC COFF dynamic library EBCDIC-encoded XML file MPEG JVT-NAL sequence H264 video Material Exchange Format audio-video container format Microsoft Agent Character file application/x-object-file application/x-executable-file application/x-library-file application/x-object-file application/x-executable-file application/x-library-file application/xml video/h264 application/mxf Quicken data file Microsoft Outlook address file Microsoft Answer Wizard file ADX audio file Microsoft System Deployment Image SDI format Free Lossless Image Format (FLIF) image/flif IDOL KeyView (12.13) Extension AC3 O PFY EDC DMP QPF2 A SL XML 264 MXF ACS QDF WAB ADX SDI FLIF File Class adSOUND adOBJECTMODULE Readers adEXECUTABLE adLIBRARY adMISC adMISC adENCAPSULATION adMISC adFONT adOBJECTMODULE adEXECUTABLE adLIBRARY adOBJECTMODULE adEXECUTABLE adLIBRARY adWORDPROCESSOR adMOVIE adMOVIE adMOVIE adACCOUNTING adMISC adMISC adSOUND adMISC adRASTERIMAGE Page 143 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name DPX_Fmt Avro_Fmt InstallShield_Archive_ Fmt Mac_Executable_Fmt GDSII_Fmt ActiveMime_Fmt SmartCharts_Fmt Webex_ARF_Fmt Webex_WRF_Fmt PGP_NetShare_Fmt Ability_WP_OLE_Fmt Ability_SS_OLE_Fmt InDesign_IDML_Fmt Executable_JAR_Fmt IDOL_IDX_Fmt Android_Package_Kit_ Fmt Android_Binary_XML_ Fmt Java_WAR_Fmt Java_EAR_Fmt Atom_Syndication_Fmt RSS_Fmt SMIL_Fmt Number 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 XSLT_Fmt 1008 XML_Shareable_Playlist_ 1009 Fmt FictionBook_Fmt 1010 Adobe_Premiere_ Project_Fmt 1011 Category 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 Description Digital Picture Exchange (DPX) image format Apache Avro binary format InstallShield archive (early versions) format MIME Type image/dpx Mac OS-X (Mach-O) executable format GDSII data format Microsoft ActiveMime (mso) documents BizInt SmartCharts data format Webex advanced network ARF recordings Webex local WRF recordings Symantec PGP NetShare encrypted file Ability Write later versions format Ability Spreadsheet later versions format Adobe InDesign IDML format Executable Java Archive (jar) file IDOL Server IDX file Android Package Kit (APK) format application/x-mso application/vnd.adobe.indesign-idml-package application/java-archive application/vnd.android.package-archive Android Binary XML (compressed by aapt) format application/xml Java WAR file format Java EAR file format Atom Syndication Format application/atom+xml RSS syndication XML format application/rss+xml Synchonized Multimedia Integration Language (SMIL) XML format application/smil+xml Extensible Stylesheet Language Transformations application/xslt+xml (XSLT) format XML Shareable Playlist Format (XSPF) application/xspf+xml FictionBook e-book XML format Adobe Premiere project format application/x-fictionbook+xml image/vnd.adobe.premiere Extension DPX AVRO EX_ GDS, GDS2 MSO CHP, CHRR ARF WRF AWW AWS IDML JAR IDX APK XML WAR EAR ATOM RSS SMIL XSL, XSLT XSPF FB2 PPJ File Class adRASTERIMAGE adMISC adENCAPSULATION Readers avrosr adEXECUTABLE adCAD adMISC adMISC adMOVIE adMOVIE adENCAPSULATION adWORDPROCESSOR adSPREADSHEET adDESKTOPPUBLSH adENCAPSULATION adENCAPSULATION adEXECUTABLE gdsiisr olesr unzip adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr xmlsr adWORDPROCESSOR xmlsr adWORDPROCESSOR xmlsr adWORDPROCESSOR adMISC xmlsr IDOL KeyView (12.13) Page 144 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name RDF_XML_Fmt Really_Simple_ Discovery_Fmt SBML_Fmt Number 1012 1013 1014 SRU_Fmt SSML_Fmt 1015 1016 PLS_Fmt 1017 TEI_Fmt METS_Fmt 1018 1019 MODS_Fmt 1020 Metalink_Fmt Open_eBook_Fmt SRGS_Fmt 1021 1022 1023 SPARQL_Results_Fmt Adobe_XML_Data_ Package_Fmt ESzigno_Fmt Mozilla_XUL_Fmt 1024 1025 1026 1027 SyncML_Fmt 1028 VoiceXML_Fmt 1029 TI_Target_Configuration_ 1030 Fmt LZFSE_Fmt 1031 Kindle_eBook_Fmt Oasis_Stream_Fmt 1032 1033 Category 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 Description RDF/XML format Really Simple Discovery (RSD) XML format MIME Type application/rdf+xml application/rsd+xml Systems Biology Markul Language (SBML) XML application/sbml+xml format Search/Retrieve via URL (SRU) XML format application/sru+xml Speech Synthesis Markup Language (SSML) XML application/ssml+xml format Pronunciation Lexicon Specification (PLS) XML format application/pls+xml Text Encoding Initiative (TEI) XML format application/tei+xml Metadata Encoding and Transmission Standard (METS) XML format application/mets+xml Metadata Object Description Schema (MODS) XML format application/mods+xml Metalink XML format application/metalink4+xml Open eBook (OEBPS) XML format application/oebps-package+xml Speech Recognition Grammar Specification (SRGS) XML format application/srgs+xml SPARQL Query Results XML format application/sparql-results+xml Adobe XML Data Package format application/vnd.adobe.xdp+xml e-Szigno signed xml document application/vnd.eszigno3+xml Mozilla XML User Interface Language (XUL) XML application/vnd.mozilla.xul+xml format Synchronization Markup Language (SyncML) XML format application/vnd.syncml+xml VoiceXML (VXML) XML format application/voicexml+xml Texas Instruments CCXML target configuration XML format Lempel-Ziv Finite State Entropy (LZFSE) compression format Amazon Kindle or Mobipocket eBook format application/vnd.amazon.ebook Open Artwork System Interchange Standard (OASIS) format IDOL KeyView (12.13) Extension RDF RSD SBML SRU SSML PLS TEI METS MODS METALINK OPF SRGS SRX XDP ES3 XUL XML VXML CCXML LZFSE AZW, PRC OAS File Class adWORDPROCESSOR adWORDPROCESSOR Readers xmlsr xmlsr adWORDPROCESSOR xmlsr adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr adWORDPROCESSOR xmlsr adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr adWORDPROCESSOR xmlsr adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr xmlsr adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr adWORDPROCESSOR xmlsr adWORDPROCESSOR adWORDPROCESSOR xmlsr adENCAPSULATION adWORDPROCESSOR adMISC Page 145 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Amazon_KFX_Fmt 1034 KTX_Fmt 1035 GMSH_Mesh_Fmt 1036 Collada_DAE_Fmt 1037 YIN_Fmt 1038 MPEG_Playlist_Fmt 1039 Windows_Audio_Playlist_ 1040 Fmt DTS_Audio_Fmt 1041 Chemical_Markup_ Language_Fmt 1042 CrystalMaker_Fmt 1043 VTK_XML_Fmt 1044 IPFIX_Fmt 1045 Portable_Font_ Resource_Fmt 1046 MARC_Fmt 1047 MARC_XML_Fmt 1048 XAR_Fmt Symbian_Installer_Fmt SO_Drawing_XML_Fmt 1049 1050 1051 SO_Text_Global_XML_ Fmt ODF_Chart_Fmt ODF_Database_Fmt ODF_Image_Fmt ODF_Text_Master_Fmt ODF_Text_Web_Fmt ODF_Chart_Template_ Fmt ODF_Formula_ 1052 1053 1054 1055 1056 1057 1058 1059 Category 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 316 991 992 993 994 995 996 997 998 Description Amazon KFX eBook format KTX image format GMSH Mesh polygon format Collada Digital Asset Exchange (DAE) format YIN XML format MPEG audio playlist format Windows Audio playlist format MIME Type image/ktx model/mesh model/vnd.collada+xml application/yin+xml audio/mpegurl audio/x-ms-wax Extension KFX KTX MSH DAE YIN M3U WAX DTS Coherent Acoustics audio format Chemical Markup Language (CML) XML format audio/vnd.dts chemical/x-cml DTS CML CrystalMaker chemical format Visualization Toolkit VTK XML format IP Flow Information Export (IPFIX) format Portable Font Resource font format chemical/x-cmdf model/vnd.vtu application/ipfix application/font-tdpfr CMDF VTU IPFIX PFR Machine-Readable Cataloging (MARC21) format Machine-Readable Cataloging (MARC) XML format Extensible Archive (XAR) format Symbian installer format OpenDocument format (OpenOffice 1/StarOffice 6.7) Drawing XML OpenDocument format (OpenOffice 1/StarOffice 6.7) Writer Master document XML ODF Chart ODF Database ODF Image ODF Text Master ODF Text Web ODF Chart Template application/marc application/marcxml+xml application/vnd.symbian.install application/vnd.sun.xml.draw application/vnd.sun.xml.writer.global application/vnd.oasis.opendocument.chart application/vnd.sun.xml.base application/vnd.oasis.opendocument.image application/vnd.oasis.opendocument.text-master application/vnd.oasis.opendocument.text-web application/vnd.oasis.opendocument.chart-template MARC XML SIS SXD SXG ODC ODB ODI ODM OTH OTC ODF Formula Template application/vnd.oasis.opendocument.formula-template OTF File Class adWORDPROCESSOR adRASTERIMAGE adCAD adCAD adWORDPROCESSOR adSOUND adSOUND Readers xmlsr xmlsr xmlsr adSOUND adWORDPROCESSOR xmlsr adSCIENTIFIC adVECTORGRAPHIC adMISC adFONT xmlsr adDATABASE adWORDPROCESSOR xmlsr adENCAPSULATION adENCAPSULATION adVECTORGRAPHIC kpodfrdr adWORDPROCESSOR adVECTORGRAPHIC adDATABASE adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adVECTORGRAPHIC odfwpsr odfwpsr adWORDPROCESSOR unzip IDOL KeyView (12.13) Page 146 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Template_Fmt ODF_Drawing_ Template_Fmt 1060 ODF_Image_Template_ 1061 Fmt ODF_Presentation_ Template_Fmt 1062 ODF_Spreadsheet_ Template_Fmt 1063 ODF_Text_Template_ Fmt 1064 ODF_Chart_XML_Fmt 1065 ODF_Drawing_XML_Fmt 1066 ODF_Formula_XML_Fmt 1067 ODF_Image_XML_Fmt 1068 ODF_Presentation_XML_ 1069 Fmt ODF_Spreadsheet_XML_ 1070 Fmt ODF_Text_XML_Fmt 1071 ODF_Extension_Fmt 1072 StarView_Metafile_Fmt 1073 BBeB_LRF_eBook_Fmt 1074 GPG_Trust_DB_Fmt 1075 VICE_Emulator_Fmt 1076 Portable_Game_ Notation_Fmt 1077 Doom_WAD_Fmt 1078 Device_Tree_Blob_Fmt 1079 BDF_Font_Fmt 1080 PC_Screen_Font_Fmt 1081 JNLP_Fmt 1082 XAML_Browser_ Application_Fmt 1083 Category 316 999 316 315 314 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 Description ODF Drawing/Graphics Template ODF Image Template ODF Presentation Template ODF Spreadsheet Template ODF Text Template ODF Chart flat XML format ODF Drawing/Graphics flat XML format ODF Formula flat XML format ODF Image flat XML format ODF Presentation flat XML format ODF Spreadsheet flat XML format ODF Text flat XML format ODF Extension format OpenOffice StarView MetaFile format Broad Band eBook (BBeB) in LRF format GPG trust database format VICE (Versatile Commodore Emulator) format Portable Game Notation chess format Doom IWAD/PWAD format Linux Device Tree Blob format Glyph Bitmap Distribution Format PC Screen Font format Java Network Launching Protocol XAML Browser Application (XBAP) format MIME Type Extension application/vnd.oasis.opendocument.graphics-template OTG application/vnd.oasis.opendocument.image-template OTI application/vnd.oasis.opendocument.presentationtemplate application/vnd.oasis.opendocument.spreadsheettemplate application/vnd.oasis.opendocument.text-template OTP OTS OTT application/vnd.oasis.opendocument.chart.xml application/vnd.oasis.opendocument.formula.xml application/vnd.oasis.opendocument.graphics.xml application/vnd.oasis.opendocument.image.xml application/vnd.oasis.opendocument.presentation.xml FODC FODG FODF FODI FODP application/vnd.oasis.opendocument.spreadsheet.xml FODS application/vnd.oasis.opendocument.text.xml application/vnd.openofficeorg.extension image/x-svm application/x-ext-lrf application/vnd.chess-pgn FODT OXT SVM LRF GPG VSF PGN application/x-doom application/x-font-bdf application/x-font-psf application/x-java-jnlp-file application/x-ms-xbap WAD DTB BDF PSF JNLP XBAP File Class Readers adVECTORGRAPHIC kpodfrdr adRASTERIMAGE adPRESENTATION kpodfrdr adSPREADSHEET odfsssr adWORDPROCESSOR odfwpsr adVECTORGRAPHIC adWORDPROCESSOR adVECTORGRAPHIC adRASTERIMAGE adPRESENTATION adSPREADSHEET adWORDPROCESSOR adMISC adRASTERIMAGE adWORDPROCESSOR adMISC adMISC adWORDPROCESSOR adMISC adMISC adFONT adFONT adWORDPROCESSOR adWORDPROCESSOR xmlsr xmlsr IDOL KeyView (12.13) Page 147 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name MS_Binder_Fmt XAP_Fmt StuffIt_X_Fmt FIG_Fmt Number 1084 1085 1086 1087 XPInstall_Fmt 1088 XDF_Fmt 1089 MXML_Fmt 1090 MusicXML_Fmt 1091 Finale_Fmt 1092 Spotfire_DXP_Fmt 1093 MS_Office_Theme_ 2007_Fmt 1094 Adobe_AIR_Installer_Fmt 1095 Flex_Project_Fmt FoxPro_Fmt VST_Preset_Fmt Mischief_Image_Fmt FreeArc_Fmt Autodesk_3ds_Fmt Monkeys_Audio_Fmt CALS_Fmt Dr_Halo_PAL_Fmt DPG_Fmt JPEG_XR_Fmt TCR_eBook_Fmt 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 IHEX_Fmt QCOW_Fmt VDI_Fmt 1108 1109 1110 Category 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 Description Microsoft Office Binder format Microsoft Silverlight application (XAP) format StuffIt X (SITX) archive format Facility for Interactive Generation of figures (FIG) image format XPInstall Cross-Platform Installer Module (XPI) format Extensible Data Format (XDF) XML format MXML UI markup language XML format MusicXML format Finale audio format TIBCO Spotfire DXP data format Microsoft Office theme format MIME Type application/x-msbinder application/x-silverlight-app application/x-stuffitx application/x-xfig application/x-xpinstall application/vnd.recordare.musicxml application/vnd.spotfire.dxp application/vnd.ms-officetheme Adobe AIR application installer package Adobe Flash Flex project file format FoxPro compiled source format Virtual Studio Technology (VST) preset format Mischief vector graphics image format FreeArc archive format Autodesk 3ds format Monkey's Audio format CALS raster image format Dr Halo raster image PAL file format Nintendo DS DPG video format JPEG XR (extended range) image format TCR/ZVR (Text Compression for Reader) eBook format Intel Hex format QEMU Copy On Write VirtualBox Disk Image application/vnd.adobe.air-application-installerpackage+zip application/vnd.adobe.fxp application/x-freearc application/x-3ds image/vnd.ms-photo Extension OBP XAP SITX FIG XPI XDF MXML MXL MUS DXP THMX AIR FXP FXP FXP ART ARC 3DS APE CAL PAL DPG JXR, HDP TCR, ZVR IHEX QCOW VDI File Class adENCAPSULATION adENCAPSULATION adENCAPSULATION adVECTORGRAPHIC Readers olesr adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adENCAPSULATION adSOUND adANALYTICS adMISC xmlsr xmlsr xmlsr adENCAPSULATION adENCAPSULATION adLIBRARY adSOUND adVECTORGRAPHIC adENCAPSULATION adCAD adSOUND adRASTERIMAGE adRASTERIMAGE adMOVIE adRASTERIMAGE adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adENCAPSULATION IDOL KeyView (12.13) Page 148 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name OneNote_Alternate_Fmt RMS_Protected_Fmt Number 1111 1112 Portfolio_PDF_Fmt 1113 Crystal_Reports_Fmt 1114 Thumbs_db_Fmt 1115 PagePlus_Fmt 1116 MS_Project_Exchange_ 1117 Fmt MS_Management_Pack_ 1118 MPX_Fmt AutoCAD_VBA_Project_ 1119 Fmt PLY_ASCII_Fmt 1120 PLY_Binary_Fmt 1121 JavaView_JVX_Fmt 1122 X3D_Fmt 1123 ZBrush_Project_Fmt 1124 ZBrush_Tool_Fmt 1125 Windows_Installer_ Patch_Fmt 1126 Windows_Installer_ Transform_Fmt 1127 Lotus_Approach_Fmt 1128 Outlook_SendRcv_ Settings_Fmt 1129 MS_Publisher_Scheme_ 1130 Fmt SO_Chart_Fmt 1131 SO_Database_Fmt 1132 SO_Library_Fmt 1133 PageMaker_Document_ 1134 Fmt MS_DTS_Fmt 1135 Category 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 Description OneNote Alternative Packaging Format Rights Management Services (RMS)-protected format Portfolio PDF File SAP Crystal Reports format Microsoft Windows thumbs.db format Serif PagePlus format Microsoft Project Exchange format MIME Type application/pdf application/x-rpt Microsoft Systems Center Operation Manager (SCOM) management pack MPX format AutoCAD VBA project format Polygon File Format (PLY) ASCII format Polygon File Format (PLY) binary format JavaView XML (JVX) format Extensible 3d Graphics (X3D) XML format model/x3d+xml ZBrush ZProject (ZPR) format ZBrush ZTtool (ZTL) format Microsoft Windows Installer Patch Package (MSP) format Microsoft Windows Installer Transform (MST) format Lotus Approach format application/vnd.lotus-approach Microsoft Outlook 2002 Send-Receive Settings Microsoft Publisher colour scheme Star Office 4,5 Chart Star Office 4,5 Database Star Office 4,5 Library Adobe PageMaker document application/vnd.stardivision.chart application/vnd.stardivision.base application/pagemaker Microsoft Data Transformation Services (DTS) IDOL KeyView (12.13) Extension PFILE, PPDF, PJPG, PTXT PDF RPT DB PPP MPX MPX DVB PLY PLY JVX X3D ZPR ZTL MSP MST APR, MPR SRS SCM SDS SDB SBL PMD DTS File Class adWORDPROCESSOR adWORDPROCESSOR Readers onealtsr pfilesr adWORDPROCESSOR adANALYTICS adENCAPSULATION adDESKTOPPUBLSH adSCHEDULE pdfsr olesr olesr adMISC xmlsr adMISC adCAD adCAD adCAD adCAD adCAD adCAD adENCAPSULATION xmlsr olesr adENCAPSULATION adDATABASE adMISC adMISC adVECTORGRAPHIC adDATABASE adLIBRARY adDESKTOPPUBLSH olesr olesr adMISC Page 149 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Cognos_PowerPlay_ PPR_Fmt 1136 Visual_Studio_SUO_Fmt 1137 MS_GraphEdit_Fmt 1138 ArcGIS_Graph_Fmt 1139 SID_Audio_Fmt 1140 MrSID_Fmt 1141 Cardfile_Fmt 1142 MS_Word_Mac_4_Fmt 1143 WordPerfect_5_Fmt 1144 WordPerfect_6_Fmt 1145 WordPerfect_Graphics_ 1146 1_Fmt Organization_Chart_Fmt 1147 Lotus_Organizer_Fmt 1148 Category 1071 1072 1073 1074 1075 1076 1077 205 80 178 85 1078 1079 Description package file Cognos PowerPlay up to version 7 (PPR) format MIME Type Microsoft Visual Studio solution user options (suo) file Microsoft GraphEdit File format ArcGIS Graph format SID Audio format audio/prs.sid LizardTech MrSID image format image/x-mrsid Microsoft Windows Cardfile address book format application/x-mscardfile Microsoft Word for Macintosh (version 4,5) application/msword WordPerfect (version 5) application/x-corel-wordperfect Corel WordPerfect (version 6 and higher) application/x-corel-wordperfect WordPerfect Graphics (version 1) application/vnd.wordperfect OrgPlus Organization Chart Lotus Organizer documents application/orgplus application/vnd.lotus-organizer MS_DBML_Fmt XMind_Fmt MSI_Cerius_Fmt GenBank_Fmt GIS_World_File_Fmt 1149 1150 1151 1152 1153 1080 1081 1082 1083 1084 Microsoft Database Markup Language XML document XMind document MSI Cerius chemical formula document GenBank DNA character sequence document ESRI GIS World file application/xmind chemical/x-cerius chemical/x-genbank GIS_Projection_ Metadata_Fmt 1154 PowerWorld_Binary_Fmt 1155 PowerWorld_Display_ Fmt 1156 ArcXML_Fmt 1157 GAMS_GDX_Fmt 1158 1085 1086 1087 1088 1089 ESRI Projection Metadata (PRJ) file PowerWorld Binary (PWB) file PowerWorld Display (PWD) file ESRI ArcIMS project XML file (ArcXML) General Algebraic Modeling System (GAMS) Data Exchange (GDX) format IDOL KeyView (12.13) Extension File Class Readers PPR adANALYTICS SUO adSWDEV GRF GRF SID SID CRD DOC WOP, DOC WPD WPG, QPG OPX OR2, OR3, OR4, OR5, OR6 DBML adMISC adGIS adSOUND adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE, adVECTORGRAPHIC adDATABASE adSCHEDULE mbsr wosr wp6sr adWORDPROCESSOR XMIND adPRESENTATION MSI adSCIENTIFIC GB adSCIENTIFIC BPW, GFW, JGW, adGIS afsr J2W, PGW, SDW, TFW, WLD PRJ adGIS PWB PWD adCAD adCAD AXL GDX adGIS adSCIENTIFIC Page 150 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name ArcMap_MXD_Fmt RRDtool_Fmt HWPX_Fmt SolidWorks_2015_Fmt Number 1159 1160 1161 1162 Category 1090 1091 1092 1093 Description ArcMap Map Exchange Document project (MXD) RRDtool (Round Robin Database) data file Hangul HWPX document SolidWorks (2015 onwards) file MIME Type application/hwp+zip MS_Photo_Editor_Fmt MS_Word_HTML_Fmt MS_Excel_HTML_Fmt Portable_FloatMap_Fmt RGBE_Fmt 1163 1164 1165 1166 1167 1094 1095 1096 1097 1098 Microsoft Photo Editor 'embedded GIF' file Microsoft Word HTML format Microsoft Excel HTML format Portable FloatMap (PFM) image Radiance RGBE (HDR) image application/vnd.ms-photo-editor image/x-portable-floatmap image/vnd.radiance APNG_Fmt 1168 Enhanced_Compressed_ 1169 Wavelet_Fmt Ensoniq_Waveset_Fmt 1170 Corel_Photo_Paint_Fmt 1171 OpenRaster_Fmt 1172 Krita_Fmt 1173 Gerber_Fmt 1174 PGML_Fmt 1175 Away3D_Fmt 1176 CAD_3MF_Fmt 1177 AMF_Fmt 1178 C3D_Fmt CAD_3DSystems_BFF_ Fmt NRRD_Fmt Cinema_4D_Fmt FBX_ASCII_Fmt FBX_Binary_Fmt 1179 1180 1181 1182 1183 1184 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 Animated Portable Network Graphics (AnimatedPNG) Enhanced Compressed Wavelet image image/apng image/ecw Ensoniq Waveset audio data file Corel Photo Paint (version 7 and higher) OpenRaster image Krita image Gerber image format Precision Graphics Markup Language Away3D scene file 3D Manufacturing Format document Additive manufacturing file format (AMF) document Coordinate 3D (C3D) format 3D Sprint (3D Systems) SLA Build file image/x-corelphotopaint image/openraster application/x-krita application/vnd.gerber application/vnd.ms-package.3dmanufacturing3dmodel+xml application/x-amf NRRD (nearly raw raster data) image format Cinema 4D model Kaydara FBX project (ASCII) Kaydara FBX project (binary) Extension File Class MXD adGIS RRD adDATABASE HWPX adWORDPROCESSOR SLDPRT, SLDDRW, adCAD SLDASM adRASTERIMAGE DOC, HTM adWORDPROCESSOR XLS, HTM adWORDPROCESSOR PFM adRASTERIMAGE HDR, PIC, RGBE, XYZE adRASTERIMAGE APNG, PNG adANIMATION Readers hwpxsr htmlsr htmlsr kppngrdr ECW adRASTERIMAGE ECW CPT ORA KRA GBR PGML AWD 3MF adSOUND adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adVECTORGRAPHIC adVECTORGRAPHIC adCAD adCAD xmlsr AMF adCAD xmlsr C3D BFF adCAD adCAD NRRD C4D FBX FBX adRASTERIMAGE adCAD adCAD adCAD IDOL KeyView (12.13) Page 151 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Wavefront_OBJ_Fmt 1185 Wavefront_MTL_Fmt 1186 MS_Power_BI_ Template_Fmt 1187 Windows_Sticky_Notes_ 1188 Fmt BlakHole_Fmt 1189 PowerArchiver_Fmt 1190 PageMagic_Fmt 1191 PIM_Archiver_Fmt 1192 Softdisk_Text_ Compressor_Fmt 1193 Ability_PhotoPaint_Fmt 1194 Softlib_Fmt 1195 Timeworks_Publisher_ Fmt 1196 Scribe_Fmt 1197 SQLite_Write_Ahead_ Log_Fmt 1198 SQLite_WAL_Index_Fmt 1199 AutoForm_Design_Fmt 1200 TSV_Fmt 1201 OpenStreetMap_XML_ Fmt 1202 OpenStreetMap_PBF_ Fmt 1203 Nero_Audio_ Compilation_Fmt 1204 Nero_ISO_Compilation_ 1205 Fmt WordStar_for_Windows_ 1206 Fmt MS_Outlook_PAB_Fmt 1207 HLSL_FXO_Fmt 1208 Category 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 Description Wavefront OBJ geometry definition file Wavefront Material Template Library (MTL) Microsoft Power BI Desktop template format MIME Type Microsoft Windows Sticky Notes format BlakHole compression format PowerArchiver PA compression format NEBS PageMagic format PIM Archiver format Softdisk Text Compressor format Ability Office PhotoPaint image Softdisk Softlib compression format Timeworks Publisher (Publish It) format Scribe markup language and word processing system SQLite Write-Ahead Log file SQLite WAL-index (shm) file AutoForm Design file Tab-separated values (TSV) file OpenStreetMap XML data text/tab-separated-values OpenStreetMap Protocolbuffer Binary Format data file (.osm.pbf) Nero Audio-CD compilation file Nero ISO compilation file WordStar for Windows file Microsoft Outlook Personal Address Book (PAB) DirectX High-Level Shader Language (HLSL) pre- IDOL KeyView (12.13) Extension OBJ MTL PBIT SNT BH PA DTP PIM CTX APX SLB DTP MSS WAL SHM AFD TSV, TAB OSM PBF NRA NRI WSD PAB FXO File Class adCAD adCAD adANALYTICS Readers adWORDPROCESSOR adENCAPSULATION adENCAPSULATION adDESKTOPPUBLSH adENCAPSULATION adENCAPSULATION olesr adRASTERIMAGE adENCAPSULATION adDESKTOPPUBLSH adWORDPROCESSOR afsr adDATABASE adDATABASE adCAD adWORDPROCESSOR adGIS afsr, afsr adGIS adMISC adMISC adWORDPROCESSOR stringssr adMISC adCAD Page 152 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number HLSL_CSO_Fmt 1209 Oberon_Document_Fmt Oberon_Symbol_Fmt Oberon_Code_Fmt 1210 1211 1212 Python_Bytecode_Fmt PCPaint_Fmt PCRaster_Map_Fmt 1213 1214 1215 COM_Type_Library_Fmt 1216 MS_Visual_C_Export_ Fmt 1217 Lotus_Organizer_Report_ 1218 Fmt Audible_Audiobook_AA_ 1219 Fmt DOS_RED_Fmt 1220 CA_ZIPXP_Fmt 1221 Kindle_Topaz_Fmt 1222 Windows_Shim_ Database_Fmt 1223 MS_Incremental_Linker_ 1224 Fmt Lotus_Smart_Icon_Fmt 1225 Lotus_Organizer_Layout_ 1226 Fmt CMZ_Fmt 1227 RFFlow_Fmt 1228 InstallShield_Script_Fmt 1229 InstallShield_Rules_Fmt 1230 Windows_FTS_Fmt 1231 Category 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 Description compiled shader DirectX High-Level Shader Language (HLSL) compiled shader object Component Pascal / Oberon Document file Component Pascal / Oberon Symbol file Component Pascal / Oberon Code (executable and loadable object) file Python compiled bytecode PCPaint / Pictor Paint image format PCRaster Map / Cross System Format geographical data Microsoft Component Object Model (COM) Type library Microsoft Visual C++ Export file MIME Type application/x-bytecode.python Lotus Organizer report document Audible Audiobook (AA) file audio/audible MS-DOS RED installer library format CA Technologies ZIPXP compressed document Amazon Kindle Topaz eBook Microsoft Windows Shim Database file Microsoft Visual Studio incremental linker file Lotus Smart Icon image file Lotus Organizer print/paper layout file CMZ compression format RFFlow flowchart document InstallShield script document InstallShield Compiled Rules file Microsoft Windows 95/NT help full-text-search file IDOL KeyView (12.13) Extension File Class Readers CSO adCAD ODC OSF OCF adSOURCECODE adOBJECTMODULE adEXECUTABLE PYC PIC MAP, CSF adEXECUTABLE adRASTERIMAGE adGIS TLB adLIBRARY EXP adLIBRARY REP adSCHEDULE AA adSOUND RED CAZ AZW, AZW1, TPZ SDB adLIBRARY adENCAPSULATION adWORDPROCESSOR adDATABASE ILK adSWDEV SMI adRASTERIMAGE PLT adSCHEDULE CMZ FLO INS INX FTS adENCAPSULATION adPRESENTATION adENCAPSULATION adENCAPSULATION adDATABASE Page 153 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number DVD_Info_Fmt 1232 Emacs_Lisp_Bytecode_ 1233 Fmt Windows_Resource_Fmt 1234 MS_Precompiled_ Header_Fmt 1235 Borland_Turbo_Project_ 1236 Fmt PS_Font_Descriptor_Fmt 1237 MySQL_Index_Fmt 1238 MS_SQL_Fmt 1239 DNL_eBook_Fmt 1240 GD_Image_Fmt 1241 ITunes_Library_Fmt 1242 MS_SQM_Fmt 1243 VIFF_Fmt 1244 JBIG_Fmt 1245 CodeWarrior_Project_ Fmt 1246 PaintShop_Pro_JBF_Fmt 1247 Delphi_Diagram_ Portfolio_Fmt 1248 Adobe_Swatch_ Exchange_Fmt 1249 ASCII_Scene_Exporter_ 1250 Fmt AVR_Fmt 1251 Winamp_AVS_Fmt 1252 After_Effects_Project_ Fmt 1253 Anfy_Applet_Generator_ 1254 Fmt SmartCipher_Fmt 1255 Category 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 Description DVD Information (IFO) file Byte-compiled Lisp (Emacs/XEmacs) Microsoft Windows binary resource file Microsoft Visual C/C++ binary pre-compiled header Borland Turbo C project file PostScript binary Font Descriptor file MySQL MyISAM Table index Microsoft SQL Server primary database file DNAML DNL eBook GD Library image Apple iTunes music library Microsoft Windows Live Messenger/Mail log file Khoros Visualization Image File Format (VIFF) JBIG (JBIG1) image CodeWarrior C/C++ project PaintShop Pro JBF image cache file Delphi Diagram Portfolio file Adobe Swatch Exchange Format Autodesk 3ds Max ASCII Scene Exporter file AVR (Audio Visual Research) format Winamp AVS (Advanced Visualization Studio) plug-in file Adobe After Effects project Anfy (Java) Applet Generator file SmartCipher encrypted file MIME Type content/dvd application/x-bytecode.elisp image/x-viff image/jbig image/jbf IDOL KeyView (12.13) Extension IFO ELC RES PCH PRJ NTF MYI MDF DNL GD, GD2 ITL SQM XV, VIF, VIFF JBG, JBIG, BIE MCP JBF DDP ASE, ASEF ASE AVR AVS AEP AJP File Class adDATABASE adEXECUTABLE Readers adMISC adSWDEV adSWDEV adFONT adDATABASE adDATABASE adWORDPROCESSOR adRASTERIMAGE adDATABASE adMISC adRASTERIMAGE adRASTERIMAGE adSWDEV adMISC adMISC adRASTERIMAGE adCAD adSOUND adSOUND adMOVIE adMISC adENCAPSULATION Page 154 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number General_Exchange_Fmt 1256 Maxis_XA_Fmt 1257 NUT_Fmt 1258 OpenMG_Audio_Fmt 1259 TXD_Fmt 1260 DFA_Fmt 1261 FunCom_ISS_Fmt 1262 Sony_MSV_Fmt 1263 THP_Fmt 1264 Smush_Animation_Fmt 1265 SIFF_Audio_Fmt 1266 SNES_SPC_Fmt 1267 Sierra_VMD_Fmt 1268 VTech_MJP_Fmt 1269 Nullsoft_Video_Fmt 1270 Shorten_Fmt 1271 Leitch_Video_Fmt 1272 ETV_Fmt 1273 TAK_Audio_Fmt 1274 Maelstrom_ANM_Fmt 1275 SW_ANM_Fmt 1276 DeluxePaint_Animation_ 1277 Fmt Crack_Art_Fmt 1278 Time_Shift_Video_Fmt 1279 XBV_Fmt 1280 HNM4_Fmt 1281 HNM6_Fmt 1282 NXV_Fmt 1283 VP5_Fmt 1284 FutureVision_FST_Fmt 1285 Category 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 Description General Exchange Format (GXF) Maxis XA audio file NUT Open Container Format Sony OpenMG Audio (OMA) container file Renderware Texture Dictionary (TXD) file DreamForge DFA FMV format FunCom ISS audio Sony Compressed Audio (MSV/DVF) GameCube THP Video LucasArts Smush SAN Animation Format Beam Software SIFF audio file SNES SPC700 audio file Sierra Video and Music Data format VTech MHP video format Nullsoft Video format (NSV) Shorten audio file Leitch Exchange Format video (LXF) ETV video file TAK audio file Maelstrom ANM animation Savage Warriors ANM animation DeluxePaint animation Crack Art image Time Shift Video (TSV) format XBV video CRYO HNM4 video CRYO HNM6 video NXV video On2 VP5 video FutureVision FST video MIME Type application/gxf IDOL KeyView (12.13) Extension GXF XA NUT OMA, OMG TXD DFA ISS DVF, ICS, MSV THP SAN, NUT SON SPC VMD MJP NSV SHN LXF ETV TAK ANM ANM ANM CA1, CA2, CA3 TSV XBV HNM HNM, HNS NXV VP5 FST File Class adMOVIE adSOUND adMOVIE adSOUND adRASTERIMAGE adMOVIE adSOUND adSOUND adMOVIE adANIMATION adSOUND adSOUND adMOVIE adMOVIE adMOVIE adSOUND adMOVIE adMOVIE adSOUND adANIMATION adANIMATION adANIMATION adRASTERIMAGE adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE Readers Page 155 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Electronic_Arts_Audio_ Fmt 1286 YOP_Fmt 1287 Matrox_Setup_Program_ 1288 Fmt Vivado_Design_Suite_ Fmt 1289 Meridian_Lossless_ Packing_Fmt 1290 Electronic_Arts_SEAD_ 1291 Fmt Electronic_Arts_MPC_ Fmt 1292 PMP_Fmt 1293 DEGAS_Fmt 1294 DEGAS_Compressed_ Fmt AutoCAD_Plotter_Fmt 1295 1296 Category 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 Description Electronic Arts audio file Psygnosis YOP video Matrox Setup Program Archive MVA file Xilinx Vivado Design Suite file Meridian Lossless Packing Audio file Electronic Arts SEAD audio Electronic Arts MPC video PMP video DEGAS (Design & Entertainment Graphic Arts System) image DEGAS (Design & Entertainment Graphic Arts System) compressed image AutoCAD Plot Style and Configuration files MIME Type Tiny_Stuff_Fmt 1297 1228 Tiny Stuff image JV_Video_Fmt REDCode_Fmt SIFF_Video_Fmt VP6_Fmt MTV_Fmt RSO_Fmt Star3_Fmt DXA_Fmt MTH_Fmt MAD_Fmt Bink2_Fmt PVA_Fmt 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 Bitmap Brothers JV video REDCode video format Beam Software SIFF video file On2 VP6 video Chinese MP4/MTV video Mindstorm RSO audio Creative Labs Star 3 audio Runesoft DXA video Nintendo GameCube video file Electronic Arts MAD video file Bink Video 2 audio-video container TechnoTrend PVA video IDOL KeyView (12.13) Extension STR File Class adSOUND YOP MVA adMOVIE adMISC VDS adMISC MLP adSOUND TGV adSOUND MPC adMOVIE PMP PI1, PI2, PI3 adMOVIE adRASTERIMAGE PC1, PC2, PC3 adRASTERIMAGE CTB, STB, PC3, PMP TNY, TN1, TN2, TN3.TN4.TN5.TN6 JV R3D VB VP6 MTV RSO ST3 DXA MTH MAD BIK, BK2 PVA adCAD adRASTERIMAGE adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE adSOUND adSOUND adMOVIE adMOVIE adMOVIE adMOVIE adMOVIE Readers Page 156 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Interplay_ACMP_Fmt Ipix_Fmt IVR_Fmt NuppelVideo_Fmt VFlash_PTX_Fmt PMD_Ringtone_Fmt RoQ_Fmt CRYO_APC_Fmt VGZ_Fmt Novastorm_Video_Fmt UTalk_Fmt Xbox_XMV_Fmt AbiWord_Fmt AbiWord_Template_Fmt Psion_Word_Fmt Psion_Sheet_Fmt Psion_Sketch_Fmt Psion_Record_Fmt Psion_MBM_Fmt Psion_TextEd_Fmt Psion_AIF_Fmt Psion_PIC_Fmt Psion_Object_Fmt Psion_Executable_Fmt Psion_Sound_Fmt Psion_Database_Fmt Psion_Word_3_Fmt Psion_Sheet_3_Fmt Zoner_Draw_Fmt Zoner_BMI_Fmt Number 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 Category 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 Description MIME Type Interplay ACMP audio Ipix spherical image RealNetworks Internet Video Recording (IVR) file NuppelVideo file VTech V.Flash VTX image Polyphonic Ringtone PMD audio application/x-pmd RoQ video CRYO Interactive APC audio VGZ video Novastorm Media video file MicroTalk/UTalk audio Microsoft Xbox XMV video AbiWord document application/x-abiword AbiWord template Psion EPOC Word document Psion EPOC Sheet spreadsheet Psion EPOC Sketch image Psion EPOC Record audio Psion EPOC Multi-Bitmap (MBM) image Psion EPOC TextEd file Psion EPOC Application Information File (AIF) Psion 3 PIC bitmap Psion 3 OPL Object File Psion 3 IMG/APP executable Psion 3 Sound file Psion EPOC Database Psion 3 Word document Psion 3 Sheet spreadsheet Zoner Draw / Zoner Callisto Metafile (ZMF) version 4+ Zoner BMI image IDOL KeyView (12.13) Extension File Class adSOUND IPX adRASTERIMAGE IVR adMOVIE NUV adMOVIE PTX adRASTERIMAGE PMD adSOUND ROQ adMOVIE APC, HNM, BF, ZIK adSOUND VGZ adMOVIE FA, FLM adMOVIE UTK adSOUND XMV adMOVIE ABW adWORDPROCESSOR ABT adWORDPROCESSOR PSI, PSITEXT adWORDPROCESSOR PSISHEET adSPREADSHEET adRASTERIMAGE adSOUND MBM adRASTERIMAGE adWORDPROCESSOR AIF adRASTERIMAGE PIC adRASTERIMAGE OPA, OPO adENCAPSULATION IMG, APP adEXECUTABLE WVE adSOUND adDATABASE WRD adWORDPROCESSOR SPR adSPREADSHEET ZMF adVECTORGRAPHIC Readers xmlsr stringssr stringssr stringssr BMI adRASTERIMAGE Page 157 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number TealDoc_Fmt 1340 TealPaint_Fmt 1341 PalmDOC_Fmt 1342 QiOO_Fmt 1343 Plucker_Fmt 1344 eReader_Fmt 1345 Quickword_Fmt 1346 Quicksheet_Fmt 1347 Quickpoint_Fmt 1348 TealMeal_Fmt 1349 zTXT_Fmt 1350 TomeRaider_Fmt 1351 TomeRaider_PDB_Fmt 1352 WordSmith_Fmt 1353 iSilo_Fmt 1354 SuperMemo_Fmt 1355 BDicty_Fmt 1356 PalmOS_Executable_Fmt 1357 PalmOS_Library_Fmt 1358 Shanda_Bambook_Fmt 1359 PMLZ_Fmt 1360 Rocket_eBook_Fmt 1361 iBooks_Author_Fmt 1362 Statistica_Spreadsheet_ 1363 Fmt Statistica_Graph_Fmt 1364 Statistica_Scrollsheet_ Fmt 1365 Apple_Newton_Package_ 1366 Fmt Adobe_Zip_Extension_ Fmt 1367 Category 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 Description TealDoc PalmOS eBook TealPaint PalmOS eBook PalmDOC / Aportis DOC eBook QiOO mobile eBook Plucker eBook eReader (Palm Reader/ Peanut Reader) eBook PalmOS Quickword document PalmOS Quicksheet document PalmOS Quickpoint document TealMeal PalmOS database zTXT eBook TomeRaider eBook TomeRaider PDB eBook PalmOS Wordsmith document PalmOS iSilo document PalmOS SuperMemo document PalmOS BDicty document PalmOS executable PalmOS dynamic library Shanda Bambook eBook Palm Markup Language (PMLZ) eBook Rocket eBook Apple iBooks Author eBook Statsoft Statistica Spreadsheet Statsoft Statistica Graph File Statsoft Statistica Scrollsheet Apple Newton executable/installer/file Adobe Zip Format Extension Package (ZXP) MIME Type application/x-aportisdoc application/prs.plucker application/x-pdb-ztxt-ebook application/x-pdb-isilo-ebook application/vnd.palm application/x-snb-ebook application/x-rocketbook application/vnd.apple.ibauthor application/vnd.adobe.air-ucf-package+zip Extension PDB PDB PRC, PDB JAR PDB PDB PRC PRC PRC PDB PDB TR TR2, TR3 PDB KNO, PDB PDB PRC PRC SNB PMLZ RB IBA STA STG SCR PKG ZXP File Class adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adDATABASE adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adEXECUTABLE adLIBRARY adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET Readers stringssr adVECTORGRAPHIC adSPREADSHEET adEXECUTABLE adENCAPSULATION IDOL KeyView (12.13) Page 158 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Uniform_Office_Fmt 1368 Uniform_Office_Text_Fmt 1369 Uniform_Office_ Spreadsheet_Fmt 1370 Uniform_Office_ Presentation_Fmt 1371 Uniform_Office_Zip_Fmt 1372 Uniform_Office_Text_ Zip_Fmt 1373 Uniform_Office_ Spreadsheet_Zip_Fmt 1374 Uniform_Office_ Presentation_Zip_Fmt 1375 MacDraft_Fmt 1376 RagTime_Fmt 1377 MacDraw_Fmt 1378 Wingz_Fmt 1379 Claris_Draw_Fmt 1380 BeagleWorks_Word_Fmt 1381 BeagleWorks_Database_ 1382 Fmt BeagleWorks_ Spreadsheet_Fmt 1383 BeagleWorks_Paint_Fmt 1384 BeagleWorks_Draw_Fmt 1385 GreatWorks_Word_Fmt 1386 GreatWorks_Outline_Fmt 1387 GreatWorks_Database_ 1388 Fmt GreatWorks_ Spreadsheet_Fmt 1389 GreatWorks_Draw_Fmt 1390 Category 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 Description MIME Type Uniform Office Format document Uniform Office Format word processing document application/vnd.uof.text Uniform Office Format spreadsheet application/vnd.uof.spreadsheet Uniform Office Format presentation application/vnd.uof.presentation Uniform Office Format document, zip format Uniform Office Format word processing document, application/vnd.uof.text+zip zip format Uniform Office Format spreadsheet, zip format application/vnd.uof.spreadsheet+zip Uniform Office Format presentation, zip format application/vnd.uof.presentation+zip MacDraft drawing RagTime document MacDraw drawing Wingz spreadsheet Claris Draw document BeagleWorks (later WordPerfect Works) Word Processor document BeagleWorks (later WordPerfect Works) Database document BeagleWorks (later WordPerfect Works) Spreadsheet document BeagleWorks (later WordPerfect Works) Paint document BeagleWorks (later WordPerfect Works) Draw document Symantec GreatWorks Word Processor document Symantec GreatWorks Outline document Symantec GreatWorks Database document Symantec GreatWorks Spreadsheet document Symantec GreatWorks Draw document IDOL KeyView (12.13) Extension UOF UOF, UOT UOF, UOS UOF, UOP UOF UOF, UOT UOF, UOS UOF, UOP DRW, MDD RAG, RTD WKZ BW, WPW BW, WPW BW, WPW BW, WPW BW, WPW File Class adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET Readers xmlsr xmlsr adPRESENTATION adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adCAD adDESKTOPPUBLSH adVECTORGRAPHIC adSPREADSHEET adVECTORGRAPHIC adWORDPROCESSOR stringssr adDATABASE adSPREADSHEET adRASTERIMAGE adVECTORGRAPHIC adWORDPROCESSOR adOUTLINE adDATABASE stringssr adSPREADSHEET adVECTORGRAPHIC Page 159 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number GreatWorks_Chart_Fmt 1391 MS_Works_3_Mac_WP_ 1392 Fmt MS_Works_3_Mac_DB_ 1393 Fmt MS_Works_3_Mac_SS_ 1394 Fmt MS_Works_3_Mac_ Comm_Fmt 1395 MS_Works_3_Mac_ Draw_Fmt 1396 SAP_VDS_Fmt 1397 ZIPVFS_Fmt 1398 Right_Hemisphere_ Material_Fmt 1399 RH_Thumbnails_Fmt 1400 Westwood_Studios_ Audio_Fmt 1401 Shockwave_Stream_Fmt 1402 EGG_Video_Fmt 1403 IRCAM_Fmt 1404 Sierra_Audio_Fmt 1405 TiVo_Video_Fmt 1406 OptimFROG_Fmt 1407 LPAC_Fmt 1408 RK_Audio_Fmt 1409 Asylum_Music_Fmt 1410 Novastorm_Audio_Fmt 1411 HHE_Fmt 1412 Portable_Voice_Fmt 1413 CNM_Video_Fmt 1414 Phantom_Cine_Fmt 1415 MPEG2_Transport_ 1416 Category 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 Description Symantec GreatWorks Chart document Microsoft Works for Mac, version 3 and 4, Word Processor document Microsoft Works for Mac, version 3 and 4, Database Microsoft Works for Mac, version 3 and 4, Spreadsheet Microsoft Works for Mac, version 3 and 4, Communications document Microsoft Works for Mac, version 3 and 4, Draw document SAP 3d Visual Enterprise VDS document ZIPVFS SQLite compressed read/write database Right Hemisphere Material file MIME Type application/x-msworks application/x-msworks application/x-msworks application/x-msworks application/x-msworks Right Hemisphere thumbnail collection file Westwood Studios Audio file Shockwave Stream audio-video file EGG video file IRCAM audio file Sierra Entertainment audio file TiVo video OptimFROG audio Lossless Predictive Audio Compression file RK Audio lossless compressed audio Asylum Music Format Novastorm Media audio file HHE video Portable Voice Format audio Arxel CNM audio-video format Phantom Cine video file MPEG-2 Transport Stream video IDOL KeyView (12.13) Extension MSW, WPS WDB WKS MSW VDS SQLITE RH, RHM $RH AUD STREAM EGG IRCAM SOL TY+ OFR, OFS PAC RKA AMF SMP HHE PVF CNM CINE M2TS File Class adVECTORGRAPHIC adWORDPROCESSOR Readers adDATABASE adSPREADSHEET adCOMMUNICATION adVECTORGRAPHIC adCAD adDATABASE adCAD adCAD adSOUND adMOVIE adMOVIE adSOUND adSOUND adMOVIE adSOUND adSOUND adSOUND adSOUND adSOUND adMOVIE adSOUND adMOVIE adMOVIE adMOVIE Page 160 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Stream_Fmt Audacity_Project_Fmt 1417 Voltage_VSF_Fmt 1418 XLIFF_Fmt 1419 XBRL_Fmt 1420 AuditXPressX_Fmt 1421 Box_Note_Fmt 1422 Hikvision_DVR_Fmt 1423 Electronic_Arts_TGV_ Fmt 1424 Electronic_Arts_TGQ_ Fmt 1425 Reaper_Video_Fmt 1426 Lightweight_Video_Fmt 1427 Liquid_Audio_Fmt 1428 Extended_Instrument_ Fmt 1429 MAML_Fmt 1430 MS_Chat_Character_Fmt 1431 MS_Border_Fmt 1432 MS_Binary_Log_Fmt 1433 MS_Reader_eBook_Fmt 1434 MS_Reader_ Annotations_Fmt 1435 Amazon_KFX_Aux_Fmt 1436 Amazon_KFX_Ion_Fmt 1437 MS_DPAPI_Fmt 1438 MS_Streets_Fmt 1439 MS_Fast_Find_Index_ Fmt 1440 MS_Fresh_Paint_Fmt 1441 MS_Mathematics_Fmt 1442 Category 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 Description MIME Type Audacity audio project file application/x-audacity-project Micro Focus Voltage VSF encrypted file XML Localization Interchange File Format (XLIFF) application/xliff+xml Extensible Business Reporting Language (XBRL) AuditXPressX file Box Note document Hikvision DVR video Electronic Arts TGV video Electronic Arts TGQ video Reaper Video Lightweight Video Format (LVF) Liquid Audio eXtended Instrument generic audio tracker Microsoft Assistance Markup Language Microsoft Comic Chat Character Microsoft Office Border images Microsoft Binary Log file Microsoft Reader eBook file Microsoft Reader annotation file Amazon KFX eBook auxiliary format (2015) Amazon KFX eBook Ion format (2015) Microsoft Data Protection API (DPAPI) data Microsoft Streets & Trips map Microsoft Office Fast Find Index Microsoft Fresh Paint image Microsoft Mathematics worksheet IDOL KeyView (12.13) Extension File Class Readers AUP VDF XLF XBRL AXPX BOXNOTE TGV TGQ FMV LVF LQT XI AML AVB BDR BLG LIT EBO KFX, AZW KFX, AZW, ION EST FFX FPPX GCW adSOUND adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adMOVIE adMOVIE xmlsr xmlsr adMOVIE adMOVIE adMOVIE adSOUND adSOUND adWORDPROCESSOR adRASTERIMAGE adRASTERIMAGE adMISC adWORDPROCESSOR adWORDPROCESSOR xmlsr adWORDPROCESSOR adWORDPROCESSOR adMISC adGIS adMISC adRASTERIMAGE adSCIENTIFIC Page 161 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number MS_Instrument_ Definition_Fmt 1443 MS_Pocket_Streets_Fmt 1444 Obfuscated_OpenType_ 1445 Fmt Pfaff_PCS_Fmt 1446 Janome_JEF_Fmt 1447 Husqvarna_HUS_Fmt 1448 Husqvarna_VIP_Fmt 1449 Brother_PEC_Fmt 1450 Brother_PES_Fmt 1451 Viking_SHV_Fmt 1452 VP3_Fmt 1453 SEW_Fmt 1454 Data_Stitch_Tajima_Fmt 1455 Singer_XXX_Fmt 1456 Bernina_ART_Fmt 1457 MS_Prefetch_Fmt 1458 MS_Prefetch_ Compressed_Fmt 1459 MS_MapPoint_Fmt 1460 MS_Live_Meeting_Fmt 1461 MS_Speech_Definitions_ 1462 Fmt MS_Speech_Data_Fmt 1463 MS_SQL_CE_Fmt 1464 MS_ICE_Project_Fmt 1465 MS_DVR_Fmt 1466 Symbol_Dynamics_EXP_ 1467 Fmt XNA_Compiled_Fmt 1468 Outlook_Shortcut_Fmt 1469 Category 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 Description Microsoft MIDI Instrument Definition File Microsoft Pocket Streets map Obfuscated OpenType font (ODTTF) Pfaff PCS embroidery image Janome JEF embroidery format Husqvarna Viking HUS embroidery format Husqvarna Viking-Pfaff VIP embroidery format Brother PEC embroidery format Brother PEC embroidery format Viking SHV embroidery format VP3 embroidery format SEW embroidery format Data Stitch Tajima (DST) embroidery image Singer XXX embroidery image Bernina ART embroidery image Microsoft Windows Prefetch (uncompressed) file Microsoft Windows Prefetch (compressed) file Microsoft MapPoint map Microsoft Office Live Meeting Connection Microsoft text-to-speech Speech Definitions File Microsoft text-to-speech Speech Data File Microsoft SQL Server Compact (CE) edition database Microsoft Image Composite Editor (ICE) Project Microsoft Digital Video Recording (DVR-MS) Symbol Dynamics EXP v1-4 document Microsoft XNA Compiled Format Microsoft Outlook or Exchange folder shortcut MIME Type application/vnd.ms-package.obfuscated-opentype video/x-ms-dvr Extension IDF MPS ODTTF PCS JEF HUS VIP PEC PES SHV VP3 SEW DST XXX ART PF PF PTM RTC SDF SPD SDF SPJ DVR-MS WXP XNB XNK File Class adSOUND Readers adGIS adFONT adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adVECTORGRAPHIC adMISC adMISC adGIS adSCHEDULE adMISC adDATABASE adDATABASE adMISC adMOVIE adWORDPROCESSOR stringssr adENCAPSULATION adMISC IDOL KeyView (12.13) Page 162 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number ChiWriter_Fmt 1470 ChiWriter4_Fmt 1471 Lightning_Strike_Fmt 1472 Blackberry_Executable_ 1473 Fmt EndNote_Library_Fmt 1474 EndNote_Library_X_Fmt 1475 EndNote_Filter_Fmt 1476 EndNote_Style_Fmt 1477 EndNote_Connection_ Fmt 1478 Camtasia_Recording_ Fmt 1479 Camtasia_Project_Fmt 1480 TechSmith_Project_Fmt 1481 ABIF_Fmt 1482 CIF_Fmt 1483 Sibelius_Fmt 1484 Geogebra_Worksheet_ Fmt 1485 Geogebra_Tool_Fmt 1486 Polynomial_Texture_ Map_Fmt 1487 Poly_Tracker_Fmt 1488 PC_Outline_Fmt 1489 Spline_Font_Database_ 1490 Fmt QuickTime_Image_Fmt 1491 XBin_Image_Fmt 1492 Segmented_ Hypergraphics_Fmt 1493 LEADTools_CMP_Fmt 1494 WBMP_Fmt 1495 Category 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 Description ChiWriter document (up to version 3) ChiWriter document (version 4) Lightning Strike image Blackberry executable EndNote Library (up to version 9) EndNote Library (version X onwards) EndNote Filter EndNote Style EndNote Connection Camtasia Recording Camtasia XML Project TechSmith JSON Project Applied Biosystems Inc. Format (ABIF) Crystallographic Information File Sibelius musical score Geogebra worksheet Geogebra tool Polynomial Texture Map (PTM) Poly Tracker audio PC-Outline document Spline Font Database (SFD) font QuickTime (QTIF) image XBin image MS Segmented Hypergraphics image LEADTools CMP image Wireless Bitmap image (WBMP) MIME Type image/cis-cod application/x-endnote-library application/x-puid-fmt-327 application/x-endnote-style application/x-endnote-connect chemical/x-cif application/vnd.geogebra.file image/x-quicktime image/vnd.wap.wbmp IDOL KeyView (12.13) Extension CHI CHI COD COD ENL ENL, ENLX ENF ENS ENZ CAMREC CAMPROJ TSCPROJ AB1, FSA CIF SIB GGB GGT PTM PTM PCO SFD QTIF, QIF, QTI XB SHG CMP WBMP File Class adWORDPROCESSOR adWORDPROCESSOR adRASTERIMAGE adEXECUTABLE Readers adDATABASE adDATABASE adDATABASE adDATABASE adDATABASE adMOVIE adWORDPROCESSOR adWORDPROCESSOR adSCIENTIFIC adSCIENTIFIC adSOUND adSCIENTIFIC adSCIENTIFIC adRASTERIMAGE adSOUND adWORDPROCESSOR adFONT adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE Page 163 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Blender_Fmt 1496 Blender_v1_Fmt 1497 Scribus_Fmt 1498 LyX_Fmt 1499 NZB_Fmt 1500 KWord_Fmt 1501 KSpread_Fmt 1502 KPresenter_Fmt 1503 KWord_GZ_Fmt 1504 KSpread_GZ_Fmt 1505 KPresenter_GZ_Fmt 1506 Karbon_Fmt 1507 KChart_Fmt 1508 KPlato_Fmt 1509 GIMP_Pattern_Fmt 1510 GIMP_Brush_Fmt 1511 GIMP_Animated_Brush_ 1512 Fmt Git_Pack_Index_Fmt 1513 Git_Index_Fmt 1514 MS_Tape_Fmt 1515 STL_Binary_Fmt 1516 Unix_Shadow_Fmt 1517 MS_SQL_Log_Fmt 1518 DER_Certificate_Fmt 1519 EDIFACT_Fmt 1520 X12_Fmt 1521 Mathcad_Fmt 1522 Mathcad_XML_Fmt 1523 EDrawings_Fmt 1524 Category 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 Description Blender (v2) CAD file Blender (v1) CAD file Scribus document LyX document NewzBin NZB format KOffice KWord document KOffice KSpread document KOffice KPresenter document KOffice (up to v1.1) kWord document KOffice (up to v1.1) kSpread document KOffice (up to v1.1) kPresenter document KOffice Karbon document KOffice KChart document KOffice KPlato document GIMP Pattern file GIMP Brush file GIMP Animated Brush file MIME Type application/x-blender application/x-blender application/vnd.scribus application/x-lyx application/x-nzb application/vnd.kde.kword application/vnd.kde.kspread application/vnd.kde.kpresenter application/x-kword application/x-kspread application/x-kpresenter application/vnd.kde.karbon application/vnd.kde.kchart application/x-vnd.kde.kplato Git Pack Index format Git Index format Microsoft Tape Format 3D Systems Stereolithography STL Binary Format Unix /etc/shadow password file Microsoft SQL Server log DER-encoded X509 certificate application/x-x509-user-cert EDIFACT-encoded EDI document application/edifact X12-encoded EDI document application/edi-x12 Mathcad MCD document application/vnd.mcd Mathcad XMCD document application/x-mathcad eDrawings Publisher document IDOL KeyView (12.13) Extension BLEND BLEND SLA LYX NZB KWD KSP KPR KWD KSP KPR KARBON CHRT KPLATO PAT GBR GIH IDX INDEX MTF, BAK LDF DER, CER EDI EDI MCD XMCD EASM, EPRT, EDRW File Class adCAD adCAD adDESKTOPPUBLSH adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adSPREADSHEET adPRESENTATION adWORDPROCESSOR adSPREADSHEET adPRESENTATION adVECTORGRAPHIC adSPREADSHEET adSCHEDULE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE Readers lyxsr adENCAPSULATION adENCAPSULATION adENCAPSULATION adCAD adMISC adDATABASE adENCAPSULATION adDATABASE adDATABASE adSCIENTIFIC adSCIENTIFIC adCAD xmlsr Page 164 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number First_Choice_DB_Fmt 1525 First_Choice_WP_Fmt 1526 First_Choice_SS_Fmt 1527 Professional_Plan_Fmt 1528 PFS_Write_Fmt 1529 Symantec_QA_Fmt 1530 Bitmap_Graphics_Array_ 1531 Fmt OS2_Help_Fmt 1532 Frame_Vector_Fmt 1533 RBase_2_Fmt 1534 Harvard_Graphics_ Symbol2_Fmt 1535 Freelance_Graphics_Fmt 1536 Snoop_Capture_Fmt 1537 Python_Pickle_Fmt 1538 Matlab_Pcode_Fmt 1539 Rhinoceros_3D_Fmt 1540 GL_Transmission_ Binary_Fmt 1541 CAD_3DXML_Fmt 1542 CAD_3DXML_XML_Fmt 1543 Autodesk_Fusion_360_ Fmt 1544 DELFTship_Fmt 1545 Autodesk_Inventor_ Drawing_Fmt 1546 Autodesk_Inventor_Part_ 1547 Fmt Autodesk_Inventor_ Assembly_Fmt 1548 Autodesk_Revit_Fmt 1549 Category 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 Description PFS First Choice database PFS First Choice word-processing document PFS First Choice spreadsheet PFS Professional Plan spreadsheet PFS Professional Write document Symantec Q&A Database OS/2 Bitmap Graphics Array OS/2 Help/INF document Frame Vector Metafile R:Base database (v2-v4) Harvard Graphics Symbol File (v2) Lotus Freelance Graphics image Snoop Packet Capture file Python Pickle file Matlab P-code file Rhinoceros 3D Model Graphics Language (GL) Binary Transmission Format 3DVIDIA 3DXML archive 3DVIDIA 3DXML XML document Autodesk Fusion 360 model DELFTship or FREE!ship model Autodesk Inventor drawing Autodesk Inventor part Autodesk Inventor assembly Autodesk Revit document MIME Type database/x-firstchoice application/x-first-choice application/x-pfs-plan application/x-pfsprofessionalwrite image/bga model/gltf+binary application/x-3dxmlplugin IDOL KeyView (12.13) Extension FOL DOC SS PFS DTF BGA, BMP, ICO HLP, INF FMV RBF SYM DRW CAP, SNOOP PICKLE, PKL, P P 3DM GLB 3DXML 3DXML F3D FBM IDW IPT IAM RVT, RFA, RTE, RFT File Class adDATABASE adWORDPROCESSOR adSPREADSHEET adSPREADSHEET adWORDPROCESSOR adDATABASE adRASTERIMAGE Readers adWORDPROCESSOR adVECTORGRAPHIC adDATABASE adVECTORGRAPHIC adRASTERIMAGE adENCAPSULATION adEXECUTABLE adSOURCECODE adCAD adCAD adCAD adCAD adCAD adCAD adCAD adCAD adCAD adCAD Page 165 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number FreeCAD_Fmt 1550 Solid_Edge_Part_Fmt 1551 Solid_Edge_Assembly_ 1552 Fmt Solid_Edge_SheetMetal_ 1553 Fmt SolidWorks_Visualize_ Project_Fmt 1554 Apache_Parquet_Fmt 1555 AES_Crypt_Fmt 1556 SO_Math_XML_Fmt 1557 MathML_Fmt 1558 Photoshop_Brush_Fmt 1559 Photoshop_Color_Book_ 1560 Fmt Premiere_Project_Fmt 1561 Premiere_Title_Fmt 1562 Premiere_Pro_Title_Fmt 1563 Memgraph_Fmt 1564 Memgraph_XML_Fmt 1565 AV1_Image_Fmt 1566 AV1_Image_Sequence_ 1567 Fmt IVF_Fmt 1568 AV1_Image_IVF_Fmt 1569 VP8_IVF_Fmt 1570 HPROF_Fmt 1571 XLIFF_Compressed_Fmt 1572 Scenarist_Caption_Fmt SubRip_Text_Fmt EBU_Subtitling_Fmt 1573 1574 1575 Category 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 Description FreeCAD document Solid Edge part Solid Edge assembly Solid Edge sheet metal SolidWorks Visualize project Apache Parquet document AES Crypt document OpenDocument format (OpenOffice 1/StarOffice 6,7) Math XML MathML document Adobe Photoshop Brush document Adobe Photoshop Color Book Adobe Premiere Elements/Pro project Adobe Premiere title document Adobe Premiere Pro title document Memgraph database plist format Memgraph database XML format AV1 Image Format (AVIF) AV1 Image Sequence Format (AVIFS) IVF container document AV1 Image (IVF container) VP8 Video (IVF container) HPROF Java Profiler document XML Localization Interchange File Format compressed (XLIFF) Scenarist Closed Caption document SubRip Text (STT) subtitles document EBU Subtitling data exchange format MIME Type application-vnd.sun.xml.math application/mathml+xml image/x-adobe-photoshop-brush application/x-bplist-memgraph image/avif image/avif-sequence image/avif application/vnd.java.hprof application/xliff+zip IDOL KeyView (12.13) Extension FCSTD PAR ASM PSM SVPJ PARQUET AES SXM MML, MATHML ABR ACB PRPROJ, PREL PTL PRTL MEMGRAPH MEMGRAPH AVIF AVIFS IVF AVIF, AVIFS VP8 HPROF XLZ SCC SRT STL File Class adCAD adCAD adCAD Readers olesr olesr adCAD olesr adCAD adDATABASE adENCAPSULATION adMISC parquetsr adMISC adMISC adMISC adMISC adMISC adMISC adDATABASE adDATABASE adRASTERIMAGE adANIMATION adRASTERIMAGE adRASTERIMAGE adMOVIE adMISC adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR Page 166 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Apache_ORC_Fmt NES_Sound_Fmt IW13_IWA_Fmt BioRad_Image_Fmt NIfTI_Fmt MRC_DV_Fmt MRC_CCP4_Fmt ECAT_PET_Fmt OME_XML_Fmt Panasonic_RAW_Fmt Panasonic_RW2_Fmt FujiFilm_RAF_Fmt Olympus_ORF_Fmt HEVC_Fmt PAM_Fmt Paris_Audio_Fmt Calendar_Creator_Fmt Number 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 Category 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 Description Apache ORC (Optimized Row Columnar) data NES Sound File Apple iWork 2013 IWA document BioRad confocal image NIfTI (NII) neuroimaging document MRC Deltavision (DV) / Priism image MRC CCP4 2014 image ECAT medical PET image Open Microscopy Environment (OME) XML document Panasonic RAW or Leica RWL image Panasonic RW2 image FujiFilm RAF image Olympus ORF image High Efficiency Video Coding (HEVC) MP4 document Portable Arbitrary Map (PAM) image Paris Audio Format Broderbund Calendar Creator document (v4+) MIME Type image/x-panasonic-raw image/x-panasonic-rw2 image/x-fuji-raf image/x-olympus-orf video/h265 image/x-portable-arbitrarymap IWork_2013_Protected_ Fmt Corel_Wavelet_WVL_ Fmt Corel_Wavelet_WI_Fmt Corel_Painter_RIF_Fmt OmniPage_MET_Fmt OmniPage_OPD_Fmt GPS_Exchange_Fmt GL_Transmission_Fmt CorelChart_Fmt LocoScript_PCW_Fmt 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 iWork 2013 password-protected document Corel Wavelet WVL image Corel Wavelet WI image Corel Painter RIFF image Caere OmniPage MET document Caere OmniPage OPD document GPS Exchange Format GL Transmission Text Format CorelChart document LocoScript document for Amstrad PCW application/gpx+xml model/gltf+json IDOL KeyView (12.13) Extension ORC NSF IWA PIC NII DV MRC V XML File Class adDATABASE adSOUND adMISC adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC Readers orcsr RAW, RWL RW2 RAF ORF HEVC, H265 adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adRASTERIMAGE adMOVIE PAM adRASTERIMAGE FAP, PAF adSOUND CC3, CE3, CC5, BCC adSCHEDULE PAGES, NUMBERS, adWORDPROCESSOR KEY WVL adRASTERIMAGE WI RIF MET OPD GPX GLTF CCH adRASTERIMAGE adRASTERIMAGE adMISC adMISC adGIS adCAD adVECTORGRAPHIC adWORDPROCESSOR Page 167 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name LocoScript_DOS_Fmt IWork_2005_Protected_ Fmt JAR_Pack_Fmt GDIFF_Fmt AFP_Fmt Number 1603 1604 1605 1606 1607 NSIF_Fmt 1608 XSL_FO_Fmt 1609 Consolidated_CDA_Fmt 1610 WebAssembly_Binary_ Fmt 1611 Visual_Studio_SDF_Fmt 1612 MS_Pocket_Word_ PocketPC_Fmt PEA_Fmt MS_Pocket_Excel_ PocketPC_Fmt TTML_Fmt Visual_SourceSafe_ SCC_Fmt NetBeans_Profiler_Fmt Mac_Alias_Fmt Firebird_DB_Fmt InterBase_DB_Fmt LZip_Fmt UltraCompressor_Fmt PostgreSQL_Filenode_ Fmt Zebra_Metafile_Fmt Kodak_Cineon_Fmt Apple_Image2_Fmt 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 Category 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1567 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 Description LocoScript document for MS-DOS iWork 2005-2009 password-protected document Java Archive compressed with pack200 GDIFF (Generic Diff) document IBM Advanced Function Presentation (AFP) image NATO Secondary Image Format (NSIF) image XSL Formatting Object (XSL-FO) Consolidated CDA document WebAssembly (WASM) binary-code Microsoft Visual Studio browsing database (sdf) file Microsoft Pocket Word for Pocket PC PEA (Pack, Encrypt, Authenticate) archive Microsoft Pocket Excel for Pocket PC Timed Text Markup Language (TTML) document Microsoft Visual SourceSafe SCC (Source Code Control) file Java NetBeans Profiler snapshot Mac OS alias file Firebird database InterBase database lzip compressed archive UltraCompressor II archive PostgreSQL mapped relation file (pg_ filenode.map) Zoner Zebra Metafile image Kodak Cineon image Apple iOS Image2 document MIME Type application/x-java-pack200 application/gdiff application/vnd.ibm.modcap application/wasm application/lzip IDOL KeyView (12.13) Extension File Class adWORDPROCESSOR PAGES, NUMBERS, adWORDPROCESSOR KEY PACK adENCAPSULATION adMISC AFP adRASTERIMAGE Readers NSF FO, XSLFO XML WASM adRASTERIMAGE adWORDPROCESSOR adWORDPROCESSOR adEXECUTABLE xmlsr xmlsr SDF adSWDEV PSW, PWI adWORDPROCESSOR PEA PXL adENCAPSULATION adSPREADSHEET TTML SCC adWORDPROCESSOR adMISC NPS FDB GDB LZ UC2 MAP adSWDEV adMISC adDATABASE adDATABASE adENCAPSULATION adENCAPSULATION adDATABASE ZBR CIN IMG2 adVECTORGRAPHIC adRASTERIMAGE adOS Page 168 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Apple_Image3_Fmt Apple_Image4_Fmt Apple_EFI_Image_Fmt Secure_Capsule_Fmt Compact_Font_Fmt QML_Cached_Fmt KV_Mail_Subfile_Fmt JSON_Fmt DesignPro_Fmt Edraw_Max_Fmt ActivInspire_Fmt ActivStudio_Fmt Gravit_Designer_Fmt SANM_Fmt ICEDraw_Fmt MS_Equation_Fmt Affinity_Fmt Number 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 Category 1558 1559 1560 1561 1562 1563 1564 1565 1566 1568 1569 1570 1571 1572 1573 1574 1575 Description Apple iOS Image3 document Apple iOS Image4 document Apple EFI Image MacOS Secure Capsule firmware update Adobe Compact Font Format (CFF) QML Cached document Internal mail file produced by KeyView extraction from a mail container format JSON document Avery DesignPro document Edraw Max document ActivInspire flipchart document ActivStudio and ActivPrimary document Gravit Designer document LucasArts Smush SANM animation iCEDraw character graphics image Microsoft Equation Editor object Affinity Photo/Publisher/Designer document MIME Type application/font-cff application/json IOS_App_Store_ Package_Fmt Minitab_Worksheet_Fmt Minitab_Worksheet_12_ Fmt Minitab_Worksheet_14_ Fmt Minitab_Worksheet_19_ Fmt Minitab_Project_Fmt Minitab_Project_19_Fmt NIST_ITL_Fmt Silo_SIA_Fmt 1645 1646 1647 1648 1649 1650 1651 1652 1653 1576 1577 1578 1579 1580 1581 1582 1583 1584 iOS App Store Package Minitab worksheet v5-6 Minitab worksheet v12-13 Minitab worksheet v14-18 Minitab worksheet v19- Minitab project up to v18 Minitab project v19NIST-ITL standard data Nevercenter Silo 3D ASCII model IDOL KeyView (12.13) Extension IMG3 IMG4, IM4M EFIRES SCAP CFF QMLC MAIL File Class adOS adOS adOS adOS adFONT adSWDEV adWORDPROCESSOR Readers afsr JSON adWORDPROCESSOR ZDL, ZDP adPRESENTATION EDDX adPRESENTATION FLIPCHART adPRESENTATION FLP adPRESENTATION GVDESIGN adVECTORGRAPHIC SNM, ZNM adANIMATION IDF adRASTERIMAGE adWORDPROCESSOR AFPHOTO, AFPUB, adRASTERIMAGE AFDESIGN, AFTEMPLATE IPA adENCAPSULATION MTW MTW adSCIENTIFIC adSCIENTIFIC MTW adSCIENTIFIC MWX adSCIENTIFIC MPJ MPX XML SIA adSCIENTIFIC adSCIENTIFIC adSCIENTIFIC adCAD Page 169 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Silo_SIB_Fmt XCBF_Fmt Zoner_Draw_OLE_Fmt Number 1654 1655 1656 Zoner_Photo_Studio_Fmt 1657 Calligra_Plan_Fmt 1658 Symbol_Dynamics_ EXP5_Fmt 1659 REX2_Fmt 1660 WPS_Office_WP_Fmt 1661 WPS_Office_PG_Fmt 1662 WPS_Office_SS_Fmt 1663 MS_InfoPath_Fmt 1664 MS_InfoPath_XSF_Fmt 1665 PerfectWorks_Fmt 1666 CAJ_Fmt 1667 CAJ2_Fmt 1668 KDH_Fmt 1669 MS_DLL_Fmt 1670 Hancom_Cell_2010_Fmt 1671 ESRI_Layer_Fmt 1672 JPEG_XL_Fmt 1673 NES_ROM_Fmt 1674 Base64_ASCII_Fmt 1675 InDesign1_Fmt 1676 HP_PCL_XL_Fmt 1677 SubStation_Alpha_Fmt 1678 SAMI_Fmt 1679 Advanced_Authoring_ Fmt 1680 Category 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 Description Nevercenter Silo 3D binary model XML Common Biometric Format Zoner Draw / Zoner Callisto Metafile (ZMF) version 2-3 Zoner Photo Studio document Calligra Plan document Symbol Dynamics EXP v5+ document MIME Type application/x-vnd.kde.plan REX2 audio file Kingsoft WPS Office Writer application/wps-office.wps Kingsoft WPS Office Presentation application/wps-office.dps Kingsoft WPS Office Spreadsheet application/wps-office.et Microsoft InfoPath document Microsoft InfoPath form definition Novell PerfectWorks document Chinese Academic Journal CAJ document (2010-) Chinese Academic Journal CAJ document (20052010) Chinese Academic Journal KDH document (20002005) Microsoft Dynamic Link Library (DLL) Hancom Office Cell 2010 document ESRI Layer file application/x-esri-layer JPEG XL image image/jxl Nintendo Entertainment System (NES) ROM application/x-nesrom Base64-encoded ASCII text file Adobe InDesign v1 document application/x-indesign HP Printer Control Language XL (PCL XL) application/vnd.hp-pclxl SubStation Alpha subtitle document Synchronized Accessible Media Interchange (SAMI) subtitle document Advanced Authoring Format (AAF) for data interchange IDOL KeyView (12.13) Extension SIB XML ZMF ZPS PLAN WXP RX2 WPS, DOC DPS, PPT ET, XLS XSN XSF WPW CAJ CAJ KDH, CAJ DLL, PYD CELL LYR JXL NES INDD PXL, PRN SSA, ASS SMI, SAMI AAF File Class adCAD adSCIENTIFIC adVECTORGRAPHIC Readers adRASTERIMAGE adSCHEDULE adWORDPROCESSOR stringssr adSOUND adWORDPROCESSOR adPRESENTATION adSPREADSHEET adENCAPSULATION adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR adWORDPROCESSOR mw8sr kpp97rdr xlssr adWORDPROCESSOR adLIBRARY adSPREADSHEET adGIS adRASTERIMAGE adMISC adENCAPSULATION adDESKTOPPUBLSH adVECTORGRAPHIC adWORDPROCESSOR adWORDPROCESSOR pxlsr adMOVIE Page 170 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number MF_COBOL_Library_Fmt 1681 MF_COBOL_ Intermediate_Fmt 1682 MF_COBOL_Generated_ 1683 Fmt Autodesk_EAGLE_Fmt 1684 Autodesk_EAGLE_XML_ 1685 Fmt Omnis_Studio_Fmt 1686 Seclore_Fmt 1687 Acorn_Draw_Fmt 1688 Hadoop_Sequence_File_ 1689 Fmt Archicad_GSM_Fmt 1690 Autodesk_Point_Cloud_ 1691 Fmt Autodesk_ReCap_Scan_ 1692 Fmt Autodesk_ReCap_ Project_Fmt 1693 BRL_CAD_Binary_Fmt 1694 Cartesian_Perceptual_ Compression_Fmt 1695 Clarion_Database_Fmt 1696 ColoRIX_Fmt 1697 Compressed_ISO_Fmt 1698 Corel_RAVE_Fmt 1699 Clicker_eBook_Fmt 1700 Datafork_TrueType_Fmt 1701 Dzip_Fmt 1702 Digital_Symphony_Fmt 1703 Disk_Archiver_Fmt 1704 E57_LIDAR_Fmt 1705 Category 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 Description Micro Focus COBOL library Micro Focus Net Express intermediate file Micro Focus COBOL generated code file Autodesk EAGLE library Autodesk EAGLE XML library Omnis Studio file a Seclore-encrypted document whose format cannot be determined Acorn Draw image Apache Hadoop sequence file Archicad library part (GSM) file Autodesk Indexed Point Cloud Autodesk ReCap Scan Autodesk ReCap Project BRL-CAD binary database (v5) Cartesian Perceptual Compression image Clarion database ColoRIX image Compressed ISO CD image (CISO) Corel R.A.V.E. animation Crick Clicker eBook Datafork TrueType font Dzip archive Digital Symphony audio Disk Archiver archive E57 LIDAR point cloud file MIME Type model/vnd.gdl image/cpi application/x-compressed-iso application/x-dfont application/x-dzip application/x-dar IDOL KeyView (12.13) Extension LBR INT File Class adLIBRARY adLIBRARY Readers GNT adLIBRARY LBR adCAD LBR adCAD DF1, LBR, LBS adDATABASE adENCAPSULATION SEQUENCEFILE adVECTORGRAPHIC adDATABASE GSM PCG adCAD adCAD RCS adCAD RCP adCAD G CPC, CPI adCAD adRASTERIMAGE DAT RIX, SCX, SCI CSO CLK CLK DFONT DZ DSYM DAR E57 adDATABASE adRASTERIMAGE adENCAPSULATION adANIMATION adWORDPROCESSOR adFONT adENCAPSULATION adSOUND adENCAPSULATION adRASTERIMAGE Page 171 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name EGG_Archive_Fmt ALZ_Archive_Fmt Flipkart_Fmt GNumeric_Fmt Genus_Graphics_ Library_Fmt GST_Publisher_Fmt IBM_DSK_Fmt Number 1706 1707 1708 1709 1710 Category 1637 1638 1639 1640 1641 Description ESTsoft ALzip EGG archive ESTsoft ALzip ALZ archive Flipkart eBook GNOME GNumeric document Genus Graphics Library 1711 1712 1642 1643 GST/Greenstreet/Pressworks/Publish It/Timeworks document IBM SaveDskF (SKF) disk image MIME Type application/x-gnumeric DRAWIO_Fmt 1713 LightWave_Scene_Fmt 1714 LZO_Fmt 1715 MS_Access_Snapshot_ 1716 Fmt MS_Report_Definition_ Fmt 1717 MS_Shared_Dataset_ Fmt 1718 MS_Report_Data_ Source_Fmt 1719 MS_Windows_Script_Fmt 1720 Mozilla_Archive_Fmt 1721 Mozilla_LZ4_Fmt 1722 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 DRAWIO image LightWave Scene (LWSC) document lzop compressed archive Microsoft Access snapshot file application/vnd.ms-access Microsoft SQL Server Report Definition Language (RDL) document Microsoft SQL Server Shared Dataset (RSD) document Microsoft SQL Server Shared Report Data Source (RDS) document Microsoft Windows Script File (WSF) Mozilla Archive package Mozilla mozLZ4 compressed data OneNote_Package_Fmt OneNote_TOC_Fmt Open_Financial_ Exchange_Fmt Open_Financial_ Exchange_v1_Fmt PCAP_NG_Fmt PageStream_Fmt PlayStation_PSMF_Fmt 1723 1724 1725 1726 1727 1728 1729 1654 1655 1656 1657 1658 1659 1660 Microsoft OneNote Package Microsoft OneNote Table of Contents Open Financial Exchange XML file Open Financial Exchange version 1 file Wireshark PCAP Next Generation capture PageStream document Sony PlayStation Portable Media Format application/x-ofx IDOL KeyView (12.13) Extension EGG ALZ FKB GNUMERIC GX, GXL File Class adENCAPSULATION adENCAPSULATION adWORDPROCESSOR adSPREADSHEET adENCAPSULATION Readers DTP adDESKTOPPUBLSH DSK, 1DK, 2DK, 3DK DRAWIO LWS LZO SNP adENCAPSULATION adVECTORGRAPHIC adCAD adENCAPSULATION adDATABASE RDL adDATABASE RSD adDATABASE RDS adDATABASE WSF adEXECUTABLE MAR adENCAPSULATION BAKLZ4, JSONLZ4, adENCAPSULATION MOZLZ4 ONEPKG adENCAPSULATION ONETOC2 adWORDPROCESSOR OFX adWORDPROCESSOR onesr OFX adWORDPROCESSOR PCAPNG, NTAR PGS PMF adENCAPSULATION adDESKTOPPUBLSH adMOVIE Page 172 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Format Name Number Puffer_Fmt 1730 QuickBooks_Company_ 1731 Fmt QuickBooks_Backup_ OLE_Fmt 1732 QuickBooks_Backup_Fmt 1733 RIFF_MIDS_Fmt 1734 SoftMaker_TextMaker_ Fmt 1735 SoftMaker_PlanMaker_ Fmt 1736 SoftMaker_TextMaker_ XML_Fmt 1737 SoftMaker_PlanMaker_ XML_Fmt 1738 SoftMaker_ 1739 Presentations_XML_Fmt Squash_Fmt 1740 Survex_Fmt 1741 TopSpeed_Data_Fmt 1742 TurboCAD_Drawing_Fmt 1743 Utah_RLE_Fmt 1744 Xbox_Executable_Fmt 1745 ZIM_Fmt 1746 ZPAQ_Fmt 1747 LZX_Fmt 1748 NetWare_Packed_Fmt 1749 Pax_Archive_Fmt 1750 SQX_Fmt 1751 Category 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 688 1674 1675 1676 1677 1678 1679 1680 1681 1682 Description Puffer encrypted archive QuickBooks Company file QuickBooks Backup file (OLE format) QuickBooks Backup file (binary format) RIFF MIDS MIDI stream SoftMaker TextMaker TMD document SoftMaker PlanMaker PMD document SoftMaker TextMaker TMDX document SoftMaker PlanMaker PMDX document SoftMaker Presentations PMDX document Squash (&FCA) compressed data Survex 3d image Clarion TopSpeed Data file TurboCAD drawing Utah RLE image Microsoft Xbox executable ZIM compressed archive ZPAQ compressed archive LZX compressed archive Novell Personal NetWare packed file pax (portable archive exchange) archive SQX archive MIME Type Extension PUF QBW QBB QBB MDS TMD, TMV PMD, PMV TMDX, TMVX PMDX, PMVX PRDX, PRVX 3D TPS TCW, TCT RLE XBE ZIM, ZIMAA ZPAQ LZX PAX SQX File Class adENCAPSULATION adACCOUNTING Readers adACCOUNTING adACCOUNTING adSOUND adWORDPROCESSOR adSPREADSHEET adWORDPROCESSOR adSPREADSHEET adPRESENTATION adENCAPSULATION adVECTORGRAPHIC adMISC adCAD adRASTERIMAGE adEXECUTABLE adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION adENCAPSULATION 1MHT, EML, and MBX files might return either format 2, 233, or 395, depending on the text in the file. In general, files that contain fields such as To, From, Date, or Subject are considered to be email messages; files that contain fields such as content-type and mime-version are considered to be MHT files; and files that do not contain any of those fields are considered to be text files. 2All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on. 3This format is returned only if you enable source code identification. See Source Code Identification, on page 85. IDOL KeyView (12.13) Page 173 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats 4This format is returned only if you enable extended source code identification. See Source Code Identification, on page 85. IDOL KeyView (12.13) Page 174 of 284 File Classes Attribute Number Description File class 0 No file class AutoDetNoFormat 01 Word processor adWORDPROCESSOR 02 Spreadsheet adSPREADSHEET 03 Database adDATABASE 04 Raster image adRASTERIMAGE 05 Vector graphic adVECTORGRAPHIC 06 Presentation adPRESENTATION 07 Executable adEXECUTABLE 08 Encapsulation adENCAPSULATION 09 Sound adSOUND 10 Desktop publishing adDESKTOPPUBLSH 11 Outline/planning adOUTLINE 12 Miscellaneous adMISC 13 Mixed format adMIXED 14 Font adFONT 15 Time scheduling adSCHEDULE 16 Communications adCOMMUNICATION 17 Object module adOBJECTMODULE 18 Library module adLIBRARY 19 Fax adFAXFORMAT 20 Movie adMOVIE 21 Animation adANIMATION 22 Source Code adSOURCECODE 23 Computer-Aided Design adCAD IDOL KeyView (12.13) Page 175 of 284 Filter SDK Java Programming Guide Appendix A: Supported Formats Attribute Number Description File class 24 BI and analysis tools adANALYTICS 25 Scientific data adSCIENTIFIC 26 Geographic Info System adGIS 27 Software Development adSWDEV 28 Operating System adOS 29 Accounting software adACCOUNTING IDOL KeyView (12.13) Page 176 of 284 Appendix B: Document Readers This section lists the KeyView document readers that are available to filter, export, and view supported file formats. · Key to Document Readers Table 177 · Document Readers 179 Key to Document Readers Table The document readers table includes the following information. Column Reader Description Filter Export View Extract Metadata Charset H/F Associated File Formats Description The name of the reader. A description of the reader. Shows whether KeyView can filter text from the main content of the file. Shows whether KeyView supports export to HTML, XML, and PDF. Shows whether KeyView provides viewing capability. Shows whether KeyView can extract sub-files. Shows whether KeyView can extract metadata (properties such as title, author, and subject). Shows whether KeyView can detect and extract the character set. Even though a file format might be able to provide character set information, some documents might not contain character set information. Therefore, the document reader would not be able to determine the character set of the document. Shows whether KeyView can extract headers and footers. The file formats that are supported by the reader. Key to Symbols Symbol Description Y The feature is supported. IDOL KeyView (12.13) Page 177 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Key to Symbols, continued Symbol Description N The feature is not supported. P Partial metadata is extracted from this format. Some non-standard fields are not extracted. T Only text is extracted from this format. Formatting information is not extracted. M Only metadata (title, subject, author, and so on) is extracted from this format. Text and formatting information are not extracted. IDOL KeyView (12.13) Page 178 of 284 Document Readers Reader ActiveX components ad1sr afmsr afsr Description Microsoft Visio (2013) Filter Export View Extract Metadata Charset H/F Associated File Formats N N Y1 N Y N N MS_Visio_2013_Fmt AD1 Evidence file N N Y Y N n/a N AD1_Fmt Adobe Font Metrics Y T T N N N N Adobe_Font_Metrics_Fmt ASCII Text Y Y Y N N N N ABAP_Fmt, AMPL_Fmt, APL_Fmt, ASCII_ Text_Fmt, ASN1_Fmt, ATS_Fmt, Agda_ Fmt, Alloy_Fmt, Apex_Fmt, AppleScript_ Fmt, Arduino_Fmt, AsciiDoc_Fmt, AspectJ_ Fmt, Assembly_Fmt, Awk_Fmt, BlitzMax_ Fmt, Bluespec_Fmt, Brainfuck_Fmt, Brightscript_Fmt, CLIPS_Fmt, CMake_Fmt, COBOL_Fmt, CPlusPlus_Fmt, CWeb_Fmt, C_Fmt, CartoCSS_Fmt, Ceylon_Fmt, Chapel_Fmt, Clarion_Fmt, Clean_Fmt, Clojure_Fmt, CoffeeScript_Fmt, Component_Pascal_Fmt, Cool_Fmt, Coq_ Fmt, Creole_Fmt, Crystal_Fmt, Csharp_ Fmt, Csound_Document_Fmt, Csound_ 1Visio 2013 is supported in Viewing only, with the support of ActiveX components from the Microsoft Visio 2013 Viewer. Image fidelity is supported but other features, such as highlighting, are not. IDOL KeyView (12.13) Page 179 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader Description Filter Export View Extract Metadata Charset H/F Associated File Formats Fmt, Css_Fmt, Cuda_Fmt, DIGITAL_ Command_Language_Fmt, DTrace_Fmt, D_Fmt, Dart_Fmt, Dockerfile_Fmt, ECL_ Fmt, E_Fmt, Eiffel_Fmt, Elm_Fmt, Emacs_ Lisp_Fmt, EmberScript_Fmt, Erlang_Fmt, Fantom_Fmt, Forth_Fmt, Fortran_Fmt, FreeMarker_Fmt, Frege_Fmt, Fsharp_Fmt, GAMS_Fmt, GAP_Fmt, GDScript_Fmt, GIS_World_File_Fmt, GLSL_Fmt, G_code_ Fmt, Game_Maker_Language_Fmt, Gnuplot_Fmt, Go_Fmt, Golo_Fmt, Gosu_ Fmt, Gradle_Fmt, GraphQL_Fmt, Graphviz_DOT_Fmt, Groovy_Fmt, HLSL_ Fmt, Hack_Fmt, Haml_Fmt, Handlebars_ Fmt, Haskell_Fmt, Hy_Fmt, IDL_Fmt, IGOR_Pro_Fmt, Idris_Fmt, Inform_7_Fmt, Ini_Fmt, Ioke_Fmt, Isabelle_Fmt, JSONiq_ Fmt, JSX_Fmt, J_Fmt, Jasmin_Fmt, Java_ Fmt, Javascript_Fmt, Jolie_Fmt, Julia_Fmt, KV_Mail_Subfile_Fmt, KiCad_Layout_Fmt, KiCad_Schematic_Fmt, Kotlin_Fmt, LFE_ Fmt, LOLCODE_Fmt, Lasso_Fmt, Limbo_ Fmt, Lisp_Fmt, LiveScript_Fmt, Lua_Fmt, MAXScript_Fmt, ML_Fmt, MSDOS_Batch_ File_Fmt, M_Fmt, Makefile_Fmt, Markdown_Fmt, Mathematica_Fmt, Matlab_Fmt, Max_Code_Fmt, Mercury_ Fmt, Modelica_Fmt, Modula_2_Fmt, Monkey_Fmt, Moocode_Fmt, NL_Fmt, NSIS_Fmt, NetLogo_Fmt, NewLisp_Fmt, Nginx_Fmt, Nix_Fmt, Nu_Fmt, OCaml_Fmt, ObjC_Fmt, ObjCpp_Fmt, ObjJ_Fmt, OpenCL_Fmt, OpenEdge_ABL_Fmt, IDOL KeyView (12.13) Page 180 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader aiffsr asfsr assr Description Filter Export View Extract Metadata Charset H/F Associated File Formats OpenSCAD_Fmt, Ox_Fmt, Oxygene_Fmt, Oz_Fmt, PAWN_Fmt, PHP_Fmt, PLSQL_ Fmt, PLpgSQL_Fmt, Pan_Fmt, Parrot_ Assembly_Fmt, Pascal_Fmt, Perl_Fmt, PicoLisp_Fmt, Pike_Fmt, Pony_Fmt, Powershell_Fmt, Processing_Fmt, Prolog_ Fmt, Puppet_Fmt, PureBasic_Fmt, Python_ Fmt, QMake_Fmt, RAML_Fmt, RDoc_Fmt, REXX_Fmt, R_Fmt, Racket_Fmt, Ragel_ Fmt, Rascal_Fmt, Rebol_Fmt, Red_Fmt, RenPy_Fmt, RenderScript_Fmt, Ring_Fmt, RobotFramework_Fmt, Ruby_Fmt, Rust_ Fmt, SAS_Fmt, SGML_Fmt, SPARQL_Fmt, SQLPL_Fmt, SQL_Fmt, SaltStack_Fmt, Scala_Fmt, Scheme_Fmt, Scilab_Fmt, Scribe_Fmt, Shell_Fmt, Smalltalk_Fmt, Squirrel_Fmt, Stan_Fmt, Stata_Fmt, Stylus_Fmt, SuperCollider_Fmt, Swift_Fmt, SystemVerilog_Fmt, TSV_Fmt, TSV_Fmt, TXL_Fmt, Tcl_Fmt, Tex_Fmt, Turing_Fmt, Turtle_Fmt, TypeScript_Fmt, UrWeb_Fmt, Verilog_Fmt, Vim_script_Fmt, Visual_ Basic_Fmt, WebAssembly_Fmt, WebIDL_ Fmt, Wiki_Fmt, X10_Fmt, XQuery_Fmt, Xojo_Fmt, Xtend_Fmt, YAML_Fmt, YANG_ Fmt, Zephir_Fmt, eC_Fmt, reStructuredText_Fmt, xBase_Fmt Audio Interchange M N N N Y File Format N N AIFF_Fmt Advanced Systems N N N N Y Format (1.2) N N ASF_Fmt, WMA_Fmt, WMV_Fmt Applix Spreadsheets Y Y Y N N Y N Applix_Spreadsheets_Fmt IDOL KeyView (12.13) Page 181 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader avrosr1 awsr axsr b1sr bkfsr bmpsr bzip2sr cabsr cdsr cebsr2 Description Filter Export View Extract Metadata Charset H/F Associated File Formats (4.2, 4.3, 4.4) Apache Avro binary Y N N N N format N N Avro_Fmt Applix Words (3.11, Y Y Y N N 4, 4.1, 4.2, 4.3, 4.4) Y Y Applix_Words_Fmt Applix Asterix Y T T N N N N Applix_Alis_Fmt B1 N N Y Y N n/a N B1_Fmt Microsoft Backup N N Y Y N File n/a N BKF_Fmt Windows Bitmap M M N N Y Image N N BMP_Fmt Bzip2 Compressed N N Y Y N File n/a N BZIP2_Fmt Microsoft Cabinet N N Y Y N File (1.3) n/a N CAB_Fmt Convergent Y T T N N Technologies DEF Comm. Format N N CT_DEF_Fmt Founder Chinese E- Y N N N N paper Basic (3.2.1) N N Founder_CEB_Fmt 1The avrosr reader is only available on certain platforms (see avrosr in the platform differences section). 2The cebsr reader is only available on certain platforms (see cebsr in the platform differences section). Because of known security vulnerabilities in the third party library used for this format, cebsr is disabled in formats.ini and needs to be explicitly enabled if you wish to use it. IDOL KeyView (12.13) Page 182 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader chmsr csvsr dbfsr dbxsr dcasr dcmsr difsr dmgsr dw4sr dxlsr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Microsoft Compiled N N Y Y N HTML Help (3) n/a N CHM_Fmt CSV (Comma Y Y Y N N Separated Values) N N CSV_Fmt dBase Database Y Y Y N N (III+, IV) N N dBase_Fmt Microsoft Outlook N N Y Y Y Express DBX Message Database (5.0, 6.0) Y N MS_OEDBX_Fmt IBM DCA/RFT Y Y Y N N (Revisable Form Text) (SC23-0758-1) Y N DCA_RFT_Fmt Digital Imaging & M N N N Y Communications in Medicine (DICOM) N N Dicom_Fmt Data Interchange Y Y Y N N Format N N DIF_SpreadSheet_Fmt Mac Disk Copy Disk N N Y Y N Image n/a N DMG_Fmt DisplayWrite (4) Y Y Y N N Y N IBM_Display_Write_Fmt IBM Domino Data in N N Y Y Y XML format1 N N Lotus_Domino_DXL_Fmt 1Supports non-encrypted embedded files only. IDOL KeyView (12.13) Page 183 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader emlsr1 emxsr encase2sr encasesr entsr epubsr exesr foliosr gdsiisr gifsr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Text Mail (MIME) / Y T T Y Y Microsoft Outlook Express (Windows 6, MacIntosh 5) Y N SMTP_Fmt Legato N N Y Y N EMailXtender Archives n/a N EMX_Fmt Expert Witness N N Y Y N Compression Format (EnCase) (7) n/a N EnCase_Fmt Expert Witness N N Y Y N Compression Format (EnCase) (6) n/a N EnCase_Fmt Microsoft Entourage N N Y Y Y Database (2004) Y N ENT_Fmt Open Publication Y Y Y N Y Structure eBook (2.0, 3.0) Y N Epub_Fmt, iBooks_Fmt MSDOS/Windows N N Y N N Executable n/a N MS_Executable_Fmt Folio Flat File (3.1) Y Y Y N Y Y Y Folio_Flat_Fmt GDSII data format Y T T N N N N GDSII_Fmt GIF (87, 89) M M N N Y N N GIF_87a_Fmt, GIF_89a_Fmt 1This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. IDOL KeyView (12.13) Page 184 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader gitpacksr gwfssr hl7sr htmlsr1 htmsr hwposr hwpsr hwpxsr ichatsr icssr isosr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Git Packfile N N Y Y N n/a N Git_Packfile_Fmt GroupWise FileSurf N N Y Y Y email N N GWFS_Email_Fmt Health level7 Y Y Y N Y message (2.0) Y N Hl7_Fmt HTML N N N N Y N N MS_Excel_HTML_Fmt, MS_Word_HTML_ Fmt HTML/XHTML (3, 4) Y Y Y N Y2 Y N HTML_Fmt, Netscape_Bookmark_File_Fmt Haansoft Hangul Y Y Y Y Y HWP (2002, 2005, 2007, 2010) Y N HWP_Fmt Haansoft Hangul Y Y Y N Y HWP (97) Y N HWP_Fmt Haansoft Hangul Y T T N N HWPX Y N HWPX_Fmt Apple iChat Log (1, Y Y Y N N AV 2, AV 2.1, AV 3) N N Apple_iChat_Fmt Microsoft Outlook N N Y Y Y iCalendar (1.0, 2.0) Y N ICS_Fmt ISO-9660 CD Disc N N Y Y N Image n/a N ISO_Fmt 1The htmlsr reader is only available on certain platforms (see htmlsr in the platform differences section). 2HTML only supports partial metadata extraction IDOL KeyView (12.13) Page 185 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader iwss13sr1 iwsssr iwwp13sr2 iwwpsr jp2000sr jpgsr jtdsr kpagrdr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Apple iWork Y T T N N Numbers ('13, '16, '18, iCloud 2018) Y N IWSS13_Fmt Apple iWork Y Y Y N Y Numbers ('08, '09) Y N IWSS_Fmt Apple iWork Pages Y T T N N ('13, '16, '18, iCloud 2018) N N IWWP13_Fmt Apple iWork Pages Y Y Y N Y ('08, '09) Y N IWWP_Fmt JPEG (2000) M M N N Y N N ISO_JPEG2000_JP2_Fmt, ISO_ JPEG2000_JPM_Fmt, ISO_JPEG2000_ JPX_Fmt, JPEG_2000_JP2_File_Fmt, JPEG_2000_PGX_Fmt, Motion_JPEG_ 2000_Fmt JPEG Interchange M M N N Y Format (JFIF) N N JPEG_File_Interchange_Fmt JustSystems Y Y Y N P Ichitaro (8 to 2013, 2018) N Y ICHITARO_Compr_Fmt, ICHITARO_Fmt Applix Y Y Y N N Presents/Graphics (4.0, 4.2, 4.3, 4.4) N N Applix_Graphics_Fmt 1The iwss13sr reader is only available on certain platforms (see iwss13sr in the platform differences section). 2The iwwp13sr reader is only available on certain platforms (see iwwp13sr in the platform differences section). IDOL KeyView (12.13) Page 186 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kpanirdr kpbmprdr kpCATrdr kpcdrrdr kpcgmrdr3 kpchtrdr kpdcxrdr kpDWGrdr4 Description Filter Export View Extract Metadata Charset H/F Associated File Formats Windows Animated N Y Y N N Cursor Windows Bitmap Y1 Y Y N N Image N N Windows_Animated_Cursor_Fmt N N BMP_Fmt CATIA formats (5) Y N N N Y CorelDRAW2 N Y Y N N (through 9.0, 10, 11, 12, X3) N N CATIA_Fmt N N Corel_Draw_Fmt Computer Graphics Y Y Y N N Metafile N N CGM_Binary_Fmt, CGM_Character_Fmt, CGM_ClearText_Fmt Microsoft Excel (2-7) N Y Y N N and Lotus 1-2-3 Charts (2-5) N N DCX Fax System N Y Y N N N N DCX_Fmt Autodesk AutoCAD Y Y Y N Y DWG Drawing (R13 onwards) Y N AutoDesk_DWG_Fmt 1Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 2CDR/CDR with TIFF header. 3Files with non-partitioned data are supported. 4The kpDWGrdr reader exists to provide DWG support on platforms where kpODArdr is not available (see kpDWGrdr in the platform differences section), but does not support graphics for versions after 2004 or text for versions after 2013. IDOL KeyView (12.13) Page 187 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kpDXFrdr1 Description Filter Export View Extract Metadata Charset H/F Associated File Formats Autodesk AutoCAD Y Y Y N Y DXF Drawing (R13 onwards) Y N AutoCAD_DXF_Binary_Fmt, AutoCAD_ DXF_Text_Fmt kpemfrdr Enhanced Metafile Y Y Y N Y N N Enhanced_Metafile_Fmt kpepsrdr Encapsulated N Y Y N N PostScript (raster) (TIFF header) N N EPSF_Fmt, Preview_EPSF_Fmt kpGFLrdr Omni Graffle Y N N N Y kpgifrdr GIF (87, 89) Y2 Y Y N N kpHEIFrdr3 High Efficiency Y4 Y Y N N Image Format image Y N Omni_Graffle_XML_Fmt N N GIF_87a_Fmt, GIF_89a_Fmt N N HEIC_Image_Fmt, HEIF_Image_Fmt kpicordr Windows Icon N Y Y N N Cursor N N Windows_Icon_Fmt kpIWPG13rdr5 Apple iWork Y T N N N Keynote ('13, '16, '18, iCloud 2018) N N IWPG13_Fmt kpIWPGrdr Apple iWork Y Y Y N Y Y N IWPG13_Fmt, IWPG_Fmt 1The kpDXFrdr reader exists to provide DXF support on platforms where kpODArdr is not available (see kpDXFrdr in the platform differences section), but does not support graphics for versions after 2004. 2Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 3The kpHEIFrdr reader is only available on certain platforms (see kpHEIFrdr in the platform differences section). 4Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 5The kpIWPG13rdr reader is only available on certain platforms (see kpIWPG13rdr in the platform differences section). IDOL KeyView (12.13) Page 188 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kpJBIG2rdr kpjp2000rdr kpjpgrdr kpmacrdr kpmsordr kpODArdr4 kpodfrdr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Keynote (2, 3, '08, '09) JBIG2 Y1 Y Y N N JPEG (2000) Y2 Y Y N N JPEG Interchange Y3 Y Y N N Format (JFIF) N N JBIG2_Fmt N N ISO_JPEG2000_JP2_Fmt, ISO_ JPEG2000_JPM_Fmt, ISO_JPEG2000_ JPX_Fmt, JPEG_2000_JP2_File_Fmt, JPEG_2000_PGX_Fmt, Motion_JPEG_ 2000_Fmt N N JPEG_File_Interchange_Fmt MacPaint N Y Y N N N N MacPaint_Fmt Microsoft Office N Y Y N N Drawing N N MS_Office_Drawing_Fmt ODA Y Y OASIS Open Y Y Document Format Y N Y Y Y6 Y Y N AutoCAD_DXF_Binary_Fmt, AutoCAD_ DXF_Text_Fmt, AutoDesk_DWG_Fmt Y N ODF_Drawing_Fmt, ODF_Drawing_ Template_Fmt, ODF_Presentation_Fmt, 1Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 2Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 3Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 4The kpODArdr reader is only available on certain platforms (see kpODArdr in the platform differences section). 6Supported using the olesr embedded objects reader. IDOL KeyView (12.13) Page 189 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kpp40rdr kpp95rdr kpp97rdr kppctrdr kppcxrdr kppdf2rdr4 kppdfrdr kppicrdr Description (1, 21) Microsoft PowerPoint (98) Filter Export View Extract Metadata Charset H/F Associated File Formats ODF_Presentation_Template_Fmt, SO_ Drawing_XML_Fmt, SO_Presentation_ XML_Fmt Y Y Y N P2 N N PowerPoint_Win_Fmt Microsoft Y Y Y N P PowerPoint Windows (95) Microsoft Y Y Y N P PowerPoint (97- 2004) Y N PowerPoint_95_Fmt Y Y3 PowerPoint_2000_Fmt, PowerPoint_97_ Fmt, WPS_Office_PG_Fmt Macintosh Raster / N Y Y N N QuickDraw (2) N N Mac_PICT_Fmt PC PaintBrush (3) N Y Y N N N N PC_Paintbrush_Fmt Adobe PDF (1.1 to N N Y N N 1.7, 2.0) N N PDF_Fmt Adobe PDF (1.1 to N Y Y N N 1.7, 2.0) N N PDF_Fmt Lotus PIC Y Y Y N N N N Lotus_PIC_Fmt 1Generated by OpenOffice Impress 2.0, StarOffice 8 Impress, and IBM Lotus Symphony Presentation 3.0. 2Microsoft PowerPoint Windows only 3Microsoft PowerPoint Windows only 4kppdf2rdr is an alternate graphic-based reader that produces high-fidelity output but does not support other features such as highlighting or text searching. The kppdf2rdr reader is only available on certain platforms (see kppdf2rdr in the platform differences section). IDOL KeyView (12.13) Page 190 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kppngrdr kpppxrdr kpprerdr kpprzrdr kpsddrdr kpsdwrdr kpsgirdr kpshwrdr kpsunrdr kpTGArdr Description Portable Network Graphics Filter Export View Extract Metadata Charset H/F Associated File Formats Y1 Y Y N N N N APNG_Fmt, PNG_Fmt Microsoft Y Y Y Y Y PowerPoint Windows XML (2007 onwards) Y Y MS_PPT_2007_Fmt, MS_PPT_Macro_ 2007_Fmt Lotus Freelance Y Y Y N N Graphics 2 (2) N N Freelance_OS2_Fmt, Freelance_Win_Fmt Lotus Freelance Y Y Y N N Graphics (96, 97, 98, R9, 9.8) N N Freelance_96_Fmt, Freelance_97_Fmt, Freelance_DOS_Fmt StarOffice Impress Y T N N N (3, 4, 5) N N SO_Presentation_Fmt Lotus AMIDraw N Y Y N N Graphics N N Ami_Pro_Draw_Fmt, SO_Text_Fmt SGI RGB Image N Y Y N N N N SGI_Image_Fmt Corel Presentations Y Y Y N N (6, 7, 8, 9, 10, 11, 12, X3) N N Corel_Presentations_Fmt Sun Raster Image N Y Y N N N N Sun_Raster_Fmt Truevision Targa (2) N Y Y N N N N Targa_Fmt 1Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. IDOL KeyView (12.13) Page 191 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kptifrdr kpUGrdr kpVSD2rdr kpVSDXrdr kpWEBPrdr4 kpwg2rdr Description Filter Export View Extract Metadata Charset H/F Associated File Formats TIFF Tagged Image Y2 Y Y N N File (through 6.01) N N TIFF_Fmt Unigraphics (UG) Y N N N N NX N N Unigraphics_NX_Fmt Microsoft Visio (4, 5, Y Y Y N Y 2000, 2002, 2003, 2007, 20103) Y N MS_Visio_Fmt Microsoft Visio Y Y Y Y Y (2013) WebP image Y5 Y Y N N Y N MS_Visio_2013_Fmt, MS_Visio_2013_ Macro_Fmt, MS_Visio_2013_Stencil_Fmt, MS_Visio_2013_Stencil_Macro_Fmt, MS_ Visio_2013_Template_Fmt, MS_Visio_ 2013_Template_Macro_Fmt N N WebP_Fmt WordPerfect N Y Y N N Graphics 2 (2, 7) N N WordPerfect_Graphics_Fmt 1The following compression types are supported: no compression, CCITT Group 3 1-Dimensional Modified Huffman, CCITT Group 3 T4 1-Dimensional, CCITT Group 4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits. 2Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. 3Viewing and Export use the graphic reader, kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions. Image fidelity in Viewing and Export is therefore only supported for versions 2003 and above. Filter uses the graphic reader kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions. 4The kpWEBPrdr reader is only available on certain platforms (see kpWEBPrdr in the platform differences section). 5Filtering is supported through OCR, which is only available on certain platforms (see Optical Character Recognition in the platform differences section), and is licensed separately. IDOL KeyView (12.13) Page 192 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader kpwmfrdr kpwpgrdr kpXFDLrdr kvgz kvgzsr kvhqxsr kvzee kvzeesr l123sr lasr lwpsr4 Description Windows Metafile (3) Filter Export View Extract Metadata Charset H/F Associated File Formats Y1 Y Y N N N N Windows_Metafile_Fmt, Windows_ Metafile_NoHdr_Fmt WordPerfect N Y Y N N Graphics 1 (1) N N WordPerfect_Graphics_Fmt Extensible Forms Y Y Y N Y Description Language Y N XFDL_Fmt GZIP archive (2) N N Y N N n/a N GZ_Compress_Fmt GZIP archive (2) N N N Y N n/a N GZ_Compress_Fmt BinHex N N Y Y N n/a N BinHex_Fmt UNIX Compress N N Y N N n/a N Compress_Fmt UNIX Compress N N N Y N n/a N Compress_Fmt Lotus 1-2-3 (96, 97, Y Y Y N P R9, 9.8) Lotus AMI Pro and Y Y Y N P2 Write Plus (2, 3) Y N Lotus_123_97_Fmt, Lotus_123_Format_ Fmt, Lotus_123_R9_Fmt Y3 Y Ami_Pro_Fmt, Ami_Pro_StyleSheet_Fmt Lotus Word Pro and Y Y Y N P5 N Y6 Lotus_Word_Pro_96_Fmt, Lotus_Word_ 1Windows Metafiles can contain both raster images (KeyView file class 4) and vector graphics (KeyView file class 5). Filtering is supported only for vector graphics (class 5). 2Lotus AMI Pro only 3Lotus AMI Pro only 4The lwpsr reader is only available on certain platforms (see lwpsr in the platform differences section). 5Lotus Word Pro only 6Lotus Word Pro only IDOL KeyView (12.13) Page 193 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader lyxsr lzhsr macbinsr mbsr mbxsr1 MCI Description Filter Export View Extract Metadata Charset H/F Associated File Formats SmartMaster (96, 97, R9) Pro_97_Fmt LyX Word Processor Y T T N N N N LyX_Fmt Microsoft LZH N N N Y N Compressed Folder n/a N LZH_Fmt MacBinary N N Y Y N n/a N MacBinary_Fmt Microsoft Word Y Y Y N Y Macintosh (4, 5, 6, 98) Text Mail (MIME), Y3 N T Y Y Microsoft Outlook Express (Windows 6, MacIntosh 5), Mailbox2 (Thunderbird 1.0, Eudora 6.2) N Y MS_Word_Mac_4_Fmt, MS_Word_Mac_ Fmt Y N MIME_Fmt Microsoft Media N N Y N N Control Interface N N AIFF_Fmt, AU_Audio_Fmt, ISO_ QuickTime_Fmt, MIDI_Audio_Fmt, MPEG_ Audio_Fmt, MS_Video_Fmt, MS_WAVE_ Audio_Fmt, Mobile_QuickTime_Fmt, QuickTime_Fmt 1This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. 2KeyView supports MBX files created by Eudora Email and Mozilla Thunderbird. MBX files created by other common mail applications are typically filtered, converted, and displayed. 3Text Mail only IDOL KeyView (12.13) Page 194 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader mdbsr mhtsr mifsr misr mp3sr mpeg4sr mppsr Description Microsoft Access (95 onwards) Filter Export View Extract Metadata Charset H/F Associated File Formats Y T T N N Y1 N MS_Access_2000_Fmt, MS_Access_ 2007_Fmt, MS_Access_95_Fmt, MS_ Access_97_Fmt, MS_Access_Fmt MIME HTML Y Y Y N Y (MHTML) Y N MHT_Fmt Adobe FrameMaker Y Y Y N N Interchange Format (5, 5.5, 6, 7) Y N Maker_Interchange_Fmt Microsoft Word Y Y Y N N Windows (1.0, 2.0) N Y MS_Word_Win_Fmt MPEG-1 Audio M M Y N Y layer3 (ID3 v1 and v2) N N MPEG_Audio_Fmt MPEG video M N N N Y N N Adobe_Flash_Audio_Book_Fmt, Adobe_ Flash_Audio_Fmt, Adobe_Flash_ Protected_Video_Fmt, Adobe_Flash_ Video_Fmt, Audible_Audiobook_Fmt, ISO_ 3GPP2_Fmt, ISO_3GPP_Fmt, ISO_IEC_ MPEG_4_Fmt, KDDI_Video_Fmt, MPEG4_ AVC_Fmt, MPEG4_M4A_Fmt, MPEG4_ M4B_Fmt, MPEG4_M4P_Fmt, MPEG4_ M4V_Fmt, MPEG4_Sony_PSP_Fmt, MPEG_21_Fmt, NTT_MPEG4_Fmt, Nero_ MPEG4_Audio_Fmt, QuickTime_Fmt, Sony_XAVC_Fmt Microsoft Project Y Y Y Y Y Y N MS_Project_2000_Fmt, MS_Project_2007_ 1Charset is not supported for Microsoft Access 95 or 97. IDOL KeyView (12.13) Page 195 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader msgsr1 mspubsr msw6sr mswsr multiarcsr6 mw6sr Description Filter Export View Extract Metadata Charset H/F Associated File Formats (2000 onwards) Microsoft Outlook Y2 T3 Y4 Y Y (97 onwards), Documentum EMCMF Fmt, MS_Project_41_Fmt, MS_Project_4_ Fmt, MS_Project_98_Fmt Y5 N EMCMF_Fmt, MS_Outlook_Fmt Microsoft Publisher Y T T Y Y (98 to 2016) Y N MS_Publisher_98_Fmt, MS_Publisher_Fmt Microsoft Works Y Y Y N N Word Processor for Windows (6, 2000) N Y MS_Works_Win_WP_Fmt Microsoft Works Y Y Y N N Word Processor for Windows (1, 2, 3, 4) Compressed N N Y7 Y N formats N Y MS_Works_Win_WP_Fmt n/a N ARJ_Fmt, RAR5_Fmt, Unix_Archive_Fmt, XZ_Fmt Microsoft Word for Y Y Y N Y Windows (6, 7, 8, 95) Y Y MS_Word_95_Fmt 1This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. 2Except Documentum EMCMF 3Except Documentum EMCMF 4For Outlook this is Text only 5Returns "Unicode" character set for Outlook version 2003 and up, and "Unknown" character set for previous versions. 6The multiarcsr reader is only available on certain platforms (see multiarcsr in the platform differences section). 7zip is supported with the multiarcsr reader on some platforms for Extract. 77-zip and SUN PEX archives only IDOL KeyView (12.13) Page 196 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader mw8sr mwsr mwssr mwxsr nnsr nsfsr5 oa2sr odfsssr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Microsoft Word (97- Y Y 2004) Y Y1 Y Y Y2 MS_Word_2000_Fmt, MS_Word_97_Fmt, WPS_Office_WP_Fmt Microsoft Word PC Y Y Y N N (4-6) and Windows Write (1-3) Y3 Y4 MS_Windows_Write_Fmt, MS_Word_PC_ Driver_Fmt, MS_Word_PC_Fmt, MS_ Word_PC_Glossary_Fmt, MS_Word_PC_ Misc_Fmt, MS_Word_PC_StyleSheet_Fmt Microsoft Works Y Y Y N N Spreadsheet (2, 3, 4) Y N MS_Works_DOS_SS_Fmt, MS_Works_ Mac_SS_Fmt, MS_Works_Win_SS_Fmt Microsoft Word XML Y Y Y Y Y (2007 onwards) Y Y MS_Word_2007_Flat_XML_Fmt, MS_ Word_2007_Fmt, MS_Word_Macro_2007_ Fmt NBI OASys Net Y T T N N Archive N N NBI_Net_Archive_Fmt IBM Lotus Notes N N Y Y Y database (4, 5, 6.0, 6.5, 7.0, 8.0) N N Lotus_Notes_NSF_Fmt Fujitsu Oasys (7) Y Y OASIS Open Y Y Document Format Y N P Y Y7 Y N N Oasys_Fmt Y N ODF_Spreadsheet_Fmt, ODF_ Spreadsheet_Template_Fmt 1Supported using the embedded objects reader olesr. 2Microsoft Word for Windows only 3Microsoft Windows Write only 4Microsoft Word PC only 5The nsfsr reader is only available on certain platforms (see nsfsr in the platform differences section). 7Supported using the embedded objects reader olesr. IDOL KeyView (12.13) Page 197 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader odfwpsr olesr olmsr onesr onealtsr Description (1, 21) Filter Export View Extract Metadata Charset H/F Associated File Formats OASIS Open Y Y Document Format (1, 22) Y Y3 Y Y Y ODF_Text_Fmt, ODF_Text_Master_Fmt, ODF_Text_Template_Fmt, ODF_Text_ Web_Fmt, SO_Text_XML_Fmt Windows Scrap File N N N Y Y n/a N Ability_WP_OLE_Fmt, Autodesk_3ds_ Max_Fmt, Crystal_Reports_Fmt, FPX_Fmt, MS_AtWork_Fax_Fmt, MS_Binder_Fmt, MicroStation_V8_DGN_Fmt, OLE_Fmt, PageMagic_Fmt, PagePlus_Fmt, PhotoDraw_Mix_Fmt, PowerPoint_Mac_ Fmt, SO_Chart_Fmt, SO_Database_Fmt, SO_Math_Fmt, Scrap_Fmt, SolidWorks_ Fmt, Solid_Edge_Assembly_Fmt, Solid_ Edge_Part_Fmt, Solid_Edge_SheetMetal_ Fmt, Windows_Installer_Fmt, Windows_ Installer_Patch_Fmt Microsoft Outlook for N N Y Y N Macintosh (2011) Y N MS_OutlookOLM_Fmt Microsoft OneNote Y Y Y Y N (2007, 2010, 2013, 2016) Y N OneNote_Fmt, OneNote_TOC_Fmt Microsoft OneNote Y T T Y N Alternative Packaging Format (2007 onwards) N N OneNote_Alternate_Fmt 1Generated by OpenOffice Calc 2.0, StarOffice 8 Calc, and IBM Lotus Symphony Spreadsheet 3.0. 2Generated by OpenOffice Writer 2.0, StarOffice 8 Writer, and IBM Lotus Symphony Documents 3.0. 3Supported using the embedded objects reader olesr. IDOL KeyView (12.13) Page 198 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader onmsr oo3sr orcsr1 parquetsr2 pbixsr pdf2sr3 pdfsr pfasr pffsr5 Description Filter Export View Extract Metadata Charset H/F Associated File Formats Legato Extender N N Y Y Y N N Legato_Extender_ONM_Fmt Omni Outliner (v3, Y Y Y N N OPML, OOutline) Y N OO3_Fmt, OOUTLINE_Fmt, OPML_Fmt Apache ORC Y N N N N (Optimized Row Columnar) data N N Apache_ORC_Fmt Apache Parquet Y N N N Y Database Format N N Apache_Parquet_Fmt Microsoft Power BI Y T T N N Desktop (1.11) Y N MS_Power_BI_Fmt Adobe PDF (1.1 to Y Y 1.7, 2.0) Adobe PDF (1.1 to Y Y 1.7, 2.0) N Y Y N Y4 Y N N PDF_Fmt Y N PDF_Fmt, Portfolio_PDF_Fmt ASCII Printer and Y T T N N PostScript fonts N N PostScript_Font_Fmt, Printer_Font_ASCII_ Fmt Microsoft Outlook N N Y Y Y Offline Storage File (97 onwards) Y N MS_OutlookOST_Fmt 1The orcsr reader is only available on certain platforms (see orcsr in the platform differences section). 2The parquetsr reader is only available on certain platforms (see parquetsr in the platform differences section). 3The pdf2sr reader is only available on certain platforms (see pdf2sr in the platform differences section). 4Includes support for extraction of subfiles from PDF Portfolio documents. 5The pffsr reader is only available on certain platforms (see pffsr in the platform differences section). IDOL KeyView (12.13) Page 199 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader pfilesr pkcs7sr4 pngsr psdsr pstnsr pstsr7 Description Filter Export View Extract Metadata Charset H/F Associated File Formats Rights Management Y1 T2 T3 N Y Services (RMS)- protected format N N RMS_Protected_Fmt PKCS #7 N N Y Y N cryptographic format N N PKCS_7_Fmt Portable Network M M N N Y Graphics N N PNG_Fmt Adobe Photoshop N N N N Y5 N N PSD_Fmt Microsoft Outlook N N Y Y Y Personal Folder6 (97 onwards) Y N MS_OutlookPST_Fmt Microsoft Outlook N N Y Y Y Personal Folder8 (97 onwards) N N MS_OutlookPST_Fmt 1KeyView filters only the internal redirection text. The underlying document text is not accessible without the decryption key. 2KeyView filters only the internal redirection text. The underlying document text is not accessible without the decryption key. 3KeyView filters only the internal redirection text. The underlying document text is not accessible without the decryption key. 4This reader supports PKCS #7 signed-data encapsulating PKCS #7 data only. 5Only XMP metadata is extracted for this format. 6KeyView provides several readers capable of processing PST files. The pstsr reader uses the Microsoft Messaging Application Programming Interface (MAPI), works only on Windows, and requires that Microsoft Outlook is installed. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section) and does not require Microsoft Outlook. The pstnsr reader is an alternative reader that does not require Microsoft Outlook, for all platforms not supported by pstxsr. For more information about these readers, see "Extract Subfiles from Outlook Personal Folders Files" in Chapter 3. 7This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files. 8KeyView provides several readers capable of processing PST files. The pstsr reader uses the Microsoft Messaging Application Programming Interface (MAPI), works only on Windows, and requires that Microsoft Outlook is installed. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section) and does not require Microsoft Outlook. The pstnsr reader is an alternative reader that does not require Microsoft Outlook, for all platforms not supported by pstxsr. For more information about these readers, see "Extract Subfiles from Outlook Personal Folders Files" in Chapter 3. IDOL KeyView (12.13) Page 200 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader pstxsr pwsr pxlsr qpssr qpwsr rarsr riffsr rpmsgsr2 rtfsr sassr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Microsoft Outlook N N Y Y Y Personal Folder1 (97 onwards) Y N MS_OutlookPST_Fmt PRIMEWORD Y T T N N N N PRIMEWORD_Fmt HP PCL XL (PCL 6) Y T T N N N N HP_PCL_XL_Fmt Corel Quattro Pro (5, Y Y Y N P 6, 7, 8) Y N Quattro_Pro_Win_Fmt Corel Quattro Pro Y N Y N P (X4) Y N QPW_Fmt RAR archive (2.0 N N N Y N through 3.5) n/a N RAR_Fmt Microsoft Wave M N N N Y Sound N N MS_WAVE_Audio_Fmt Microsoft Outlook N N Restricted Permission Message N Y3 N Y N RPMSG_Fmt Rich Text Format (1 Y Y Y N P through 1.7) Y Y MS_Pocket_Word_Fmt, MS_RTF_Fmt SAS7BDAT reader Y T T N N N N SAS7BDAT_Fmt 1KeyView provides several readers capable of processing PST files. The pstsr reader uses the Microsoft Messaging Application Programming Interface (MAPI), works only on Windows, and requires that Microsoft Outlook is installed. The pstxsr reader is available only on certain platforms (see pstxsr in the platform differences section) and does not require Microsoft Outlook. The pstnsr reader is an alternative reader that does not require Microsoft Outlook, for all platforms not supported by pstxsr. For more information about these readers, see "Extract Subfiles from Outlook Personal Folders Files" in Chapter 3. 2The rpmsgsr reader is only available on certain platforms (see rpmsgsr in the platform differences section). 3Extraction of embedded email messages is not currently supported. IDOL KeyView (12.13) Page 201 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader skypesr sosr starcsr starwsr stringssr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Skype Log (3) Y Y Y N N N N Skype_Fmt OpenOffice, Y T T N Y LibreOffice(1-5), StarOffice (6-9) Y N SO_Spreadsheet_XML_Fmt StarOffice Calc (3, 4, Y T T N N 5) N N SO_Spreadsheet_Fmt StarOffice Writer (3, Y T T N N 4, 5) N N SO_Text_Fmt Generic 'strings' Y T T N N reader N N BeagleWorks_Word_Fmt, CEOwrite_Fmt, CPT_Comm_Fmt, CWK_Fmt, DG_CDS_ Fmt, DSA101_Fmt, Data_Point_ VistaWord_Fmt, Enable_WP_Fmt, GreatWorks_Word_Fmt, HP_Word_PC_ Fmt, IBM_DCF_Script_Fmt, IBM_Writing_ Assistant_Fmt, Lotus_Notes_CDF_Fmt, Lyrix_Fmt, MASS_11_Fmt, MS_Works_ DOS_WP_Fmt, MS_Works_Mac_WP_Fmt, MacWrite_Fmt, MacWrite_II_Fmt, Multimate_Adv_Fmt, Multimate_Adv_ Fnote_Fmt, Multimate_Adv_II_Fmt, Multimate_Adv_II_Fnote_Fmt, Multimate_ Fmt, Multimate_Fnote_Fmt, Navy_DIF_ Fmt, ODA_Q1_11_Fmt, ODA_Q1_12_Fmt, Office_Writer_Fmt, Psion_TextEd_Fmt, Psion_Word_3_Fmt, Psion_Word_Fmt, Q_ A_DOS_Fmt, Q_A_Win_Fmt, Quadratron_ Q_One_v1_Fmt, Quadratron_Q_One_v2_ Fmt, Quickword_Fmt, SAMNA_Word_IV_ Fmt, Symbol_Dynamics_EXP5_Fmt, Symbol_Dynamics_EXP_Fmt, Targon_ Word_Fmt, Uniplex_WP_Fmt, Volkswriter_ IDOL KeyView (12.13) Page 202 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader swfsr swsr tarsr tifsr tnefsr unihtmsr unisr unzip Description Filter Export View Extract Metadata Charset H/F Associated File Formats Fmt, WANG_WITA_Fmt, WANG_WPS_ Comm_Fmt, WPS_PLUS_Fmt, WordERA_ Fmt, WordMARC_Fmt, WordPerfect_Fmt, WordStar_2000_Fmt, WordStar_Fmt, WordStar_for_Windows_Fmt, Word_ Connection_Fmt, WriteNow_Fmt, Xerox_ 860_Comm_Fmt, Xerox_Writer_Fmt Macromedia Flash Y Y Y N N (through 8.0) Y1 N Macromedia_Flash_Fmt Informix SmartWare Y T T N N II Word Processor N N SmartWare_II_WP_Fmt TAR Tape Archive N N Y Y N n/a N TAR_Fmt TIFF Tagged Image M M N N Y File (through 6.02) N N TIFF_Fmt Transport Neutral N N Y Y Y Encapsulation Format Y N TNEF_Fmt Unicode HTML Y Y Y N Y Y N Unicode_HTML_Fmt Unicode Text (3, 4) Y Y Y N N Y N Unicode_Fmt PKZIP/Zip N N Y3 Y N Compression n/a N Executable_JAR_Fmt, KMZ_Fmt, ODF_ Formula_Fmt, ODF_Formula_Template_ Fmt, PKZIP_Fmt, Tableau_Packaged_ Data_Source_Fmt, Tableau_Packaged_ 1The character set cannot be determined for versions 5.x and lower. 2The following compression types are supported: no compression, CCITT Group 3 1-Dimensional Modified Huffman, CCITT Group 3 T4 1-Dimensional, CCITT Group 4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits. 3PKZIP, WinZip, and Java Archive only IDOL KeyView (12.13) Page 203 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader uudsr vcfsr vsdsr wkssr wosr wp6sr wpmsr xlsbsr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Workbook_Fmt UU-Encoding (all N N Y Y N versions) n/a N UUEncoded_Fmt Microsoft Outlook Y Y T N Y vCard Contact (2.1, 3.0, 4.0) N N VCF_Fmt Microsoft Visio (4, 5, Y Y Y Y Y 2000, 2002, 2003, 2007, 20101) Y N MS_Visio_Fmt Lotus 1-2-3 (2, 3, 4, Y Y Y N N 5) Y N Lotus_123_Worksheet_Fmt Corel WordPerfect Y Y Y N P Windows (5, 5.1) Y Y WordPerfect_5_Fmt Corel WordPerfect Y Y Y N P (6 onwards) Y N WordPerfect_6_Fmt Corel WordPerfect Y Y Y N N Macintosh (1.02, 2, 2.1, 2.2, 3, 3.1) Y N WordPerfect_Mac_Fmt Microsoft Excel Y Y Y N Y Binary Format (2007 onwards) N N MS_Excel_Binary_2007_Fmt 1Viewing and Export use the graphic reader, kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions. Image fidelity in Viewing and Export is therefore only supported for versions 2003 and above. Filter uses the graphic reader kpVSD2rdr for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions. IDOL KeyView (12.13) Page 204 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader xlssr xlsxsr xmlsr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Microsoft Excel (2.2 Y Y to 2004) Y Y1 Y Y Y2 Excel_2000_Fmt, Excel_95_Fmt, Excel_ 97_Fmt, Excel_Chart_Fmt, Excel_Fmt, Excel_Macro_Fmt, Hancom_Cell_2010_ Fmt, WPS_Office_SS_Fmt Microsoft Excel Y Y Y Y Y Windows XML (2007 onwards) Y Y MS_Excel_2007_Fmt, MS_Excel_Macro_ 2007_Fmt XML Y T T N Y Y N AMF_Fmt, AbiWord_Fmt, Adobe_XML_ Data_Package_Fmt, Atom_Syndication_ Fmt, CDXML_Fmt, Chemical_Markup_ Language_Fmt, Collada_DAE_Fmt, Consolidated_CDA_Fmt, ESzigno_Fmt, FictionBook_Fmt, Grasshopper_GHX_Fmt, JNLP_Fmt, JavaView_JVX_Fmt, KML_Fmt, MAML_Fmt, MARC_XML_Fmt, METS_ Fmt, MODS_Fmt, MS_Excel_XML_Fmt, MS_Management_Pack_MPX_Fmt, MS_ Visio_XML_Fmt, MS_Word_XML_Fmt, MXML_Fmt, Mathcad_XML_Fmt, Metalink_ Fmt, Mozilla_XUL_Fmt, MusicXML_Fmt, Open_Diagnostic_Data_Exchange_Fmt, Open_eBook_Fmt, PDF_XML_Forms_ Data_Fmt, PGML_Fmt, PLS_Fmt, RDF_ XML_Fmt, RSS_Fmt, Really_Simple_ Discovery_Fmt, SBML_Fmt, SMIL_Fmt, SPARQL_Results_Fmt, SRGS_Fmt, SRU_ Fmt, SSML_Fmt, SVG_Fmt, SyncML_Fmt, TEI_Fmt, Tableau_Data_Source_Fmt, Tableau_Map_Source_Fmt, Tableau_ 1Supported using the embedded objects reader olesr. 2Microsoft Excel for Windows only IDOL KeyView (12.13) Page 205 of 284 Filter SDK Java Programming Guide Appendix B: Document Readers Reader xpssr xywsr yimsr1 z7zsr zstdsr Description Filter Export View Extract Metadata Charset H/F Associated File Formats Preferences_Fmt, Tableau_Workbook_ Fmt, Uniform_Office_Fmt, Uniform_Office_ Text_Fmt, VTK_XML_Fmt, VoiceXML_Fmt, WML_Fmt, Windows_Audio_Playlist_Fmt, XAML_Browser_Application_Fmt, XBRL_ Fmt, XDF_Fmt, XLIFF_Fmt, XML_Fmt, XML_Shareable_Playlist_Fmt, XSLT_Fmt, XSL_FO_Fmt, YIN_Fmt Microsoft XML Y T T N N Paper Specification N N MS_XPS_Fmt XyWrite / Nota Bene Y Y Y N N (4.12) N N XyWrite_Fmt Yahoo! Instant Y Y Y N N Messenger N N YIM_Fmt 7-Zip archive (4.57) N N Y Y N n/a N Z7Z_Fmt Zstandard N N N Y N compression n/a N Zstandard_Fmt 1To successfully use this reader, you must set the KV_YAHOO_ID environment variable to the Yahoo user ID. You can optionally set the KV_OTHER_YAHOO_ID environment variable to the other Yahoo user ID. If you do not set it, "Other" is used by default. If you enter incorrect values for the environment variables, erroneous data is generated. IDOL KeyView (12.13) Page 206 of 284 Appendix C: Platform Differences Most KeyView features and document readers are available across all platforms. This section describes the supported platforms for certain features that are not available on every platform. · Feature Differences 208 · Reader Differences 209 IDOL KeyView (12.13) Page 207 of 284 Feature Differences Feature Filter C++ API Filter .NET API RMS Decryption XMP extraction1 XMP extraction - HTML (HTML_ Fmt) XMP extraction - additional formats2 Advanced character set detection Source code identification Optical Character Recognition KVOOP privilege reduction Out-of-process logging Windows x64 x86 - - - Linux x64 x86 - - - AArch64 macOS M1 x64 - - - - Solaris x64 x86 - - - - - - - - - SPARC64 - SPARC - AIX ppc64 - ppc32 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1This refers to formats PDF (PDF_Fmt), PNG (PNG_Fmt), PSD (PSD_Fmt), JPG (JPEG_File_Interchange_Fmt), TIFF (TIFF_Fmt), XML (XML_Fmt) and pFile (RMS_ Protected_Fmt) 2This refers to formats GIF (GIF_87a_Fmt / GIF_89a_Fmt), jpeg2000 (JPEG_2000_JP2_File_Fmt), SVG (SVG_Fmt), MOV (QuickTime_Fmt), AIFF (AIFF_Fmt), FLV (Flash_Video_Fmt), SWF (Macromedia_Flash_Fmt), MP3 (MPEG_Audio_Fmt), MPEG4 (ISO_IEC_MPEG_4_Fmt), WAV (MS_WAVE_Audio_Fmt), AVI (MS_Video_ Fmt), EPS (EPSF_Fmt, Preview_EPSF_Fmt), INDD (InDesign_Fmt), WMA (WMA_Fmt) and WMV (WMV_Fmt) IDOL KeyView (12.13) Page 208 of 284 Reader Differences Reader Windows x64 x86 avrosr (Apache Avro reader) - cebsr (Founder Chinese E-paper - Basic reader) htmlsr (HTML reader for XMP extraction) iwss13sr (Apple iWork 2013 Numbers reader) iwwp13sr (Apple iWork 2013 Pages reader) kpDWGrdr (Autodesk AutoCAD - - Drawing reader for platforms without kpODArdr) kpDXFrdr (Autodesk AutoCAD DXF - - reader for platforms without kpODArdr) kpIWPG13rdr (Apple iWork 2013 Keynote reader) kpHEIFrdr (High Efficiency Image Format image reader) kpODArdr (Autodesk AutoCAD reader) Linux x64 x86 - - - - - - - AArch64 - - - macOS M1 x64 - - - - - - - - - Solaris x64 x86 - - - - - - - - - - - - - SPARC64 - SPARC - AIX ppc64 - ppc32 - IDOL KeyView (12.13) Page 209 of 284 Filter SDK Java Programming Guide Appendix C: Platform Differences Reader Windows x64 x86 kppdf2rdr (alternative graphicbased PDF reader) kpWEBPrdr (WebP Format reader) - lwpsr (Lotus Word Pro reader) - multiarcsr (multiple archive formats reader) nsfsr (Lotus Notes database reader) orcsr (Apache ORC reader) - parquetsr (Apache Parquet reader) - pdf2sr (alternative PDF reader) pffsr (Microsoft Outlook Offline Folders File reader) pstsr (MAPI-based PST reader) pstnsr (native PST reader for platforms without pstxsr) - - pstxsr (native PST reader) rpmsgsr (Microsoft Outlook - Restricted Permission Message reader) Linux x64 x86 - - - - - - - - - - - AArch64 - - - macOS M1 x64 - - - - - - - - - - - - - - - - - - Solaris x64 x86 - - - - - - - - - - - - - - - - - - - - - SPARC64 - - SPARC - - - AIX ppc64 - - - - ppc32 - - - - This topic shows only those readers that are unavailable on at least one platform. For a complete list of a readers, see Document Readers, on page 179. IDOL KeyView (12.13) Page 210 of 284 Appendix D: Character Sets This section provides information on the handling of character sets in the KeyView suite of products, which includes KeyView Filter SDK, KeyView Export SDK, and KeyView Viewing SDK. · Multibyte and Bidirectional Support 211 · Coded Character Sets 219 Multibyte and Bidirectional Support The KeyView SDKs can process files that contain multibyte characters. A multibyte character encoding represents a single character with consecutive bytes. KeyView can also process text from files that contain bidirectional text. Bidirectional text contains both Latin-based text which is read from left to right, and text that is read from right to left (Hebrew and Arabic). The following table indicates which character encodings are supported by KeyView for each format. Multibyte and bidirectional support Format Archive Single-byte 7-Zip (7Z) n/a AD1 Evidence file n/a ADJ n/a B1 n/a BinHex (HQX) n/a Bzip2 (BZ2) n/a EnCase Expert Witness n/a Compression Format (E01) GZIP (GZ) n/a ISO (ISO) n/a Java Archive (JAR) n/a Legato EMailXtender Archive n/a (EMX) MacBinary (BIN) n/a Mac Disk Copy Disk Image n/a Multibyte n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a Bidirectional n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a IDOL KeyView (12.13) Page 211 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte (DMG) Microsoft Backup File (BKF) n/a Microsoft Cabinet format (CAB) n/a Microsoft Compiled HTML Help n/a (CHM) Microsoft Compressed Folder n/a (LZH) PKZip (ZIP) n/a Microsoft Outlook DBX (DBX) Y Microsoft Outlook Offline Storage Y File (OST) RAR Archive (RAR) n/a Tape Archive (TAR) n/a UNIX Compress (Z) n/a UUEncoding (UUE) n/a Windows Scrap File (SHS) n/a WinZip (ZIP) n/a Binary Executable (EXE) n/a Link Library (DLL) n/a Computer-aided Design AutoCAD Drawing (DWG) Y AutoCAD Drawing Exchange Y (DXF) CATIA formats (CAT) Y Microsoft Visio (VSD) Y Database dBase Database Y Microsoft Access (MDB) Y Microsoft Project (MPP) Y Multibyte n/a n/a n/a n/a n/a Y Y n/a n/a n/a n/a n/a n/a n/a n/a Y Y N Y N Y Y IDOL KeyView (12.13) Bidirectional n/a n/a n/a n/a n/a Y Y n/a n/a n/a n/a n/a n/a n/a n/a Y Y N Y N N N Page 212 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Desktop Publishing Microsoft Publisher N Display Adobe Portable Document Y Format (PDF) Graphics Computer Graphics Metafile Y (CGM) Corel DRAW (CDR) n/a DCX Fax System (DCX) Y DICOM Digital Imaging and n/a Communications in Medicine (DCM) Encapsulated PostScript (EPS) Y Enhanced Metafile (EMF) Y Graphic Interchange Format n/a (GIF) JBIG2 n/a JPEG n/a JPEG 2000 n/a Lotus AMIDraw Graphics (SDW) n/a Lotus Pic (PIC) n/a Macintosh Raster (PICT/PCT) n/a MacPaint (PNTG) n/a Microsoft Office Drawing (MSO) n/a Multibyte Y Y1 N n/a N n/a N Y n/a n/a n/a n/a n/a n/a n/a n/a n/a Bidirectional N Y N n/a N n/a N N n/a n/a n/a n/a n/a n/a n/a n/a n/a 1Multibyte PDFs are supported, provided the PDF document is created by using either Character ID-keyed (CID) fonts, predefined CJK CMap files, or ToUnicode font encodings, and does not contain embedded fonts. See the Adobe website and the Adobe Acrobat documentation for more information. Any multibyte characters that are not supported are displayed using the replacement character. By default, the replacement character is a question mark (?). To determine the type of font encodings that are used in a PDF, open the PDF in Adobe Acrobat, and select File > Document Info > Fonts. If the Encoding column lists Custom or Embedded encodings, you might encounter problems converting the PDF. IDOL KeyView (12.13) Page 213 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Omni Graffle (GRAFFLE) Y PC PaintBrush (PCX) n/a Portable Network Graphics n/a (PNG) SGI RGB Image (RGB) n/a Sun Raster Image (RS) n/a Tagged Image File (TIFF) Y Truevision Targa (TGA) n/a Windows Animated Cursor (ANI) n/a Windows Bitmap (BMP) n/a Windows Icon Cursor (ICO) n/a Windows Metafile (WMF) Y WordPerfect Graphics 1 (WPG) Y WordPerfect Graphics 2 (WPG) Y Mail Documentum EMCMF Format Y Domino XML Language (DXL) Y GroupWise FileSurf Y Legato Extender (ONM) Y Lotus Notes database (NSF) Y Mailbox (MBX) Y Microsoft Entourage Database Y Microsoft Outlook (MSG) Y Microsoft Outlook Express (EML) Y Microsoft Outlook iCalendar Y Microsoft Outlook for Macintosh Y Microsoft Outlook Offline Storage Y File Microsoft Outlook Personal File Y Folders (PST) Multibyte N n/a n/a n/a n/a N n/a n/a n/a n/a Y N N Y Y N Y Y Y Y Y Y Y Y Y Y IDOL KeyView (12.13) Bidirectional N n/a n/a n/a n/a N n/a n/a n/a n/a N N N Y N N N Y Y Y Y Y Y Y Y Y Page 214 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Multibyte Bidirectional Microsoft Outlook vCard Contact Text Mail (MIME) Y Y Y Transport Neutral Encapsulation Y Y Y Format Multimedia Advanced Systems Format n/a n/a n/a (ASF) Audio Interchange File Format n/a n/a n/a (AIFF) Microsoft Wave Sound (WAV) n/a n/a n/a MIDI (MID) n/a n/a n/a MPEG 1 Audio Layer 3 (MP3) n/a n/a n/a MPEG 1 Video (MPG) n/a n/a n/a MPEG 2 Audio (MPEGA) n/a n/a n/a MPEG 4 Audio (MP4) n/a n/a n/a NeXT/Sun Audio (AU) n/a n/a n/a QuickTime Movie (QT/MOV) n/a n/a n/a Windows Video (AVI) n/a n/a n/a Presentations Apple iWork Keynote (GZ) Y Y N Applix Presents (AG) character set N N 1252 only Corel Presentations (SHW) character set N N 1252 only Extensible Forms Description Y Y N Language (XFD) Lotus Freelance Graphics 2 character set N N (PRE) 850 only Lotus Freelance Graphics (PRZ) Y Japanese, Simple Chinese, N Traditional Chinese, Thai only IDOL KeyView (12.13) Page 215 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Multibyte Bidirectional Macromedia Flash (SWF) Y Y N Microsoft OneNote Y Y N Microsoft PowerPoint PC (PPT) Microsoft PowerPoint Windows (PPT) character set Traditional Chinese only 1252 only Y Japanese, Simple Chinese, Traditional Chinese, Korean only N Hebrew only Microsoft PowerPoint Macintosh Y N N (PPT) Microsoft PowerPoint Windows Y Y Y XML 2007 and 2010 (PPTX) OASIS Open Document (ODP) Y Y N OpenOffice Impress (ODP) Y Y N StarOffice Impress (ODP) Y Y N Spreadsheets Apple iWork Numbers (GZ) Y Y N Applix Spreadsheets (AS) character set N N 1252 only Comma Separated Values (CSV) character set N N 1252 only Corel Quattro Pro (QPW/WB3) Y N N Data Interchange Format (DIF) Y Y Y1 Lotus 1-2-3 (123) Y Y Y Lotus 1-2-3 (WK4) Y Y N Lotus 123 Charts (123) Y Y N Microsoft Excel Charts (XLS) Y Y N Microsoft Excel Macintosh (XLS) Y N N Microsoft Excel Windows (XLS) Y Y Y2 Microsoft Excel Windows XML Y Y N 2007 (XLSX) Microsoft Office Excel Binary Y Y N Format (XLSB) IDOL KeyView (12.13) Page 216 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Multibyte Microsoft Works Spreadsheet Y N (S30/S40) OASIS Open Document (ODS) Y Y OpenOffice Calc (ODS) Y Y StarOffice Calc (ODS) Y Y Text and Markup ANSI (TXT) ASCII (TXT) Y Y Y Y HTML (HTM) Y Y Microsoft Excel Windows XML Y Y 2003 Microsoft Word for Windows XML Y Y 2003 Microsoft Visio XML 2003 Y Y Rich Text Format (RTF) Y Y Unicode HTML Y Y Unicode Text (TXT) XHTML Y Y Y Y XML Word Processing Y Y Adobe Maker Interchange Format (MIF) character set N 1252 only Apple iChat Log (ICHAT) Y Y Apple iWork Pages (GZ) Y Y Applix Words (AW) character set N 1252 only DisplayWrite (IP) Folio Flat File (FFF) character set N 500, 1026 only character set N 1252 only IDOL KeyView (12.13) Bidirectional N N N N Y2 Y2 Y2, 2 Y Y Y Y3 Y 2,3 Y2 Y3 Y N N N N N N Page 217 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Multibyte Bidirectional Founder Chinese E-paper Basic Y Y N (CEB) Fujitsu Oasys (OA2) Y Y N Hangul (HWP) Y Y N Health level7 (HL7) Y Y Y IBM DCA/RTF (DC) character sets N N 500, 1026 only JustSystems Ichitaro (JTD) Y Y N Lotus AMI Pro (SAM) Y Simple Chinese, Traditional Y Chinese, Japanese, Thai only Lotus AMI Professional Write Y Plus (AMI) Lotus Word Pro (LWP) Y Simple Chinese, Traditional N Chinese, Japanese, Thai only Y Y3 Lotus SmartMaster (MWP) Y Y N Microsoft Word PC (DOC) character set N N 1252 only Microsoft Word Windows V1-2 Y N (DOC) Microsoft Word Windows V6, 7, Y Y 8, 95 (DOC) Microsoft Word Windows V97 Y Y through 2003 (DOC) Microsoft Word Windows XML Y Y 2007 and 2010 (DOCX) Microsoft Word Macintosh (DOC) Y N N Hebrew only3 Y3 Y3 Y3 Microsoft Works (WPS) Y Japanese only N Microsoft Write (WRI) Y Japanese only N OASIS Open Document (ODT) Y Y N Omni Outliner (OO3) Y Y N OpenOffice Writer (ODT) Y Y N IDOL KeyView (12.13) Page 218 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Multibyte and bidirectional support, continued Format Single-byte Multibyte Bidirectional Open Publication Structure Y Y Y eBook (EPUB) StarOffice Writer (ODT) Y Y N Skype Log (DBB) Y Y (null-terminated charsets) N WordPad (RTF) Y Y Y WordPerfect Linux (WPS) Y N N WordPerfect Macintosh (WPS) Y N N WordPerfect Windows (WO) Y N N XML Paper Specification (XPS) Y Y N XYWrite Windows (XY4) character set N N 1252 only Yahoo! Instant Messenger (DAT) Y Y (null-terminated charsets) N 1The text direction in the output file might not be correct. 2In Export SDK, a bidirectional right-to-left (RTL) tag is extracted from this format and included in the direction element (<dir=RTL>) of the output. Coded Character Sets This section lists which character set you can use to specify the target character set. The coded character sets are enumerated in kvcharset.h and defined in the Filter class. Code Character Sets Coded Character Set Description Can be set as target charset? KVCS_ Unknown character set N UNKNOWN KVCS_SJIS Japanese (uses multibyte encoding), cp932 Y KVCS_GB Simplified Chinese (China, Singapore, Malaysia) Y cp936 KVCS_BIG5 Traditional Chinese (Taiwan, Hong Kong, Y Macaw) cp950 IDOL KeyView (12.13) Page 219 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Code Character Sets, continued Coded Character Set Description KVCS_KSC Korean, cp949 KVCS_1250 Windows Latin 2 (Central Europe) KVCS_1251 Windows Cyrillic (Slavic) KVCS_1252 Windows Latin 1 (ANSI) KVCS_1253 Windows Greek KVCS_1254 Windows Latin 5 (Turkish) KVCS_1255 Windows Hebrew KVCS_1256 Windows Arabic KVCS_1257 Windows Baltic Rim KVCS_1258 Windows Vietnamese KVCS_8859_1 ISO 8859-1 Latin 1 (Western Europe, Latin America) KVCS_8859_2 ISO 8859-2 Latin 2 (Central Eastern Europe) KVCS_8859_3 ISO 8859-3 Latin 3 (S.E. Europe) KVCS_8859_4 ISO 8859-4 Latin 4 (Scandinavia/Baltic) KVCS_8859_5 ISO 8859-5 Latin/Cyrillic KVCS_8859_6 ISO 8859-6 Latin/Arabic KVCS_8859_7 ISO 8859-7 Latin/Greek KVCS_8859_8 ISO 8859-8 Latin/Hebrew KVCS_8859_9 ISO 8859-9 Latin/Turkish KVCS_8859_14 ISO 8859-14 KVCS_8859_15 ISO 8859-15 KVCS_437 DOS Latin US KVCS_737 DOS Greek KVCS_775 DOS Baltic Rim KVCS_850 DOS Latin 1 Can be set as target charset? Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y IDOL KeyView (12.13) Page 220 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Code Character Sets, continued Coded Character Set Description Can be set as target charset? KVCS_851 DOS Greek Y KVCS_852 DOS Latin 2 Y KVCS_855 DOS Cyrillic Y KVCS_857 DOS Turkish Y KVCS_860 DOS Portuguese Y KVCS_861 DOS Icelandic Y KVCS_862 DOS Hebrew Y KVCS_863 DOS Canadian French Y KVCS_864 DOS Arabic Y KVCS_865 DOS Nordic Y KVCS_866 DOS Cyrillic Russian Y KVCS_869 DOS Greek 2 Y KVCS_874 Thai Y KVCS_ PDF MAC DOC N PDFMACDOC KVCS_ PDF WIN DOC N PDFWINDOC KVCS_STDENC Adobe Standard Encoding N KVCS_PDFDOC Adobe standard PDF character set N KVCS_037 EBCDIC code page 037 Y KVCS_1026 EBCDIC code page 1026 Y KVCS_500 EBCDIC code page 500 Y KVCS_875 EBCDIC code page 875 Y KVCS_LMBCS Lotus multibyte character set Group 1 and Group N 2 KVCS_UNICODE Unicode, UCS-2 Y KVCS_UTF16 16-bit Unicode transformation format Y IDOL KeyView (12.13) Page 221 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Code Character Sets, continued Coded Character Set Description Can be set as target charset? KVCS_UTF8 8-bit Unicode transformation format Y KVCS_UTF7 7-bit Unicode transformation format Y KVCS_2022_JP ISO 2022-JP, Japanese mail and news safe N encoding (JIS-7) KVCS_2022_CN ISO 2022-CN, Chinese mail and news safe N encoding KVCS_2022_KR ISO 2022-KR, Korean mail and news safe N encoding KVCS_WP6X Word Perfect 6.x and higher character mapping N KVCS_10000 Western European (Macintosh) Y KVCS_KSC5601 Unified Hangul Y KVCS_GB2312 Simplified Chinese (China, Singapore, Hong Y Kong) KVCS_GB12345 Traditional Chinese (China) - analogue of Y GB2312 KVCS_ CNS11643 Traditional Chinese - Taiwan. Supplement to Big5 Y KVCS_JIS0201 Japanese - contains ASCII character set (JIS- N Roman) KVCS_JIS0212 Japanese. Supplement to JIS0208. Y KVCS_EUC_JP Japanese Extended UNIX Code Y KVCS_EUC_GB Simplified Chinese Extended UNIX Code Y KVCS_EUC_ Traditional Chinese Extended UNIX Code N BIG5 KVCS_EUC_ Korean Extended UNIX Code N KSC KVCS_424 EBCDIC Hebrew N KVCS_856 PC Hebrew (old) N KVCS_1006 IBM AIX Pakistan (Urdu) N KVCS_KOI8R Cyrillic (Russian) Y IDOL KeyView (12.13) Page 222 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Code Character Sets, continued Coded Character Set Description KVCS_PDF_ JAPAN1 Adobe-Japan1-2 character collection KVCS_PDF_ KOREA1 Adobe-Korea1-0 character collection KVCS_PDF_GB1 Adobe-GB1-3 character collection KVCS_PDF_ CNS1 Adobe-CNS1-2 character collection KVCS_2022_JP_ ISO 2022-JP, Japanese mail and news safe 8 encoding (JIS8) KVCS_720 Arabic DOS-720 KVCS_VISCII Vietnamese VISCII KVCS_8859_10 ISO 8859-10 (Latin 6 Nordic) KVCS_8859_13 ISO 8859-13 (Latin 7 Baltic) KVCS_57002 ISCII Devanagari (x-iscii-de) KVCS_57003 ISCII Bengali (x-iscii-be) KVCS_57004 ISCII Tamil (x-iscii-ta) KVCS_57005 ISCII Telugu (x-iscii-te) KVCS_57006 ISCII Assamese (x-iscii-as) KVCS_57007 ISCII Oriya (x-iscii-or) KVCS_57008 ISCII Kannada (x-iscii-ka) KVCS_57009 ISCII Malayalam (x-iscii-ma) KVCS_57010 ISCII Gujarathi (x-iscii-gu) KVCS_57011 ISCII Panjabi (x-iscii-pa) KVCS_ GB18030b2 Reserved for internal use KVCS_GB18030 GB18030 (Chinese 4-byte character set) KVCS_8859_11 ISO 8859-11 (Thai) KVCS_8859_16 ISO 8859-16 (Latin-10 South-Eastern Europe) Can be set as target charset? N N N N N Y Y Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 n/a Y Y Y IDOL KeyView (12.13) Page 223 of 284 Filter SDK Java Programming Guide Appendix D: Character Sets Code Character Sets, continued Coded Character Set Description KVCS_ ARABICMAC Arabic Mac (x-mac-arabic) KVCS_KOI8U Cyrillic (KOI8U Ukrainian) KVCS_ HZGB2312 The 7-bit representation of GB 2312 / RFC 1842 KVCS_UTF32 32-bit Unicode transformation format Can be set as target charset? Y Y n/a Y 1The character set cannot be forced as output in Export SDK and Viewing SDK because the character set is not supported by the major browsers. IDOL KeyView (12.13) Page 224 of 284 Appendix E: Extract and Format Lotus Notes Subfiles This section describes how to create XML templates to alter the appearance of extracted Lotus mail note subfiles so that they maintain the look and feel of the original notes. · Overview 225 · Customize XML Templates 225 · Template Elements and Attributes 227 · Date and Time Formats 232 Overview KeyView uses the NSF reader, nsfsr, to extract Lotus database files, and places Lotus mail notes in subfiles. The NSF reader uses a set of default XML templates to extract the notes and apply formatting, thereby approximating the look and feel of the original notes. In some cases, you might need to customize the XML templates, for instance if your notes contain custom data. In such cases, you can modify the existing XML templates or create your own. During extraction, the NSF reader loads all XML files in the NSFtemplates directory and its subdirectories (except for the NSFtemplates\images directory, which is reserved for images). During initialization, the KeyView XML parser verifies the XML templates. If the templates contain any invalid XML, elements, or attributes, initialization fails and errors are recorded in the nsfsr.log file. Customize XML Templates XML templates are enabled by default. In most cases, the default templates should be sufficient; however, you can customize them or create your own as required. To customize XML templates for Lotus note extraction 1. Modify the template files in the following directory. install\OS\bin\NSFtemplates The main.xml file must exist in the NSFtemplates directory. It is the top-level template file that extracts all subfiles, usually by calling other templates. 2. Make sure that any modifications or additional XML files conform to the supported elements and attributes described in Template Elements and Attributes, on page 227. 3. Extract the Lotus database file. IDOL KeyView (12.13) Page 225 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Use Demo Templates For testing purposes, you can extract notes by using a set of demo templates, which are provided to demonstrate the proper usage of all the XML elements and attributes, because the default templates do not use all the XML elements. The demo templates are available at: install\OS\bin\NSFtemplates To use the demo XML templates 1. In the formats.ini file, set the following parameter. [nsfsr] UseDemoTemplate=1 2. In the main.xml file, uncomment the following section. <ifini name="UseDemoTemplate" text="1"> <call file="demo.xml"/> <quit/> </ifini> Use Old Templates For testing purposes, you can extract notes by using legacy templates, which produce MHTML output. You can generate similar output by disabling the XML templates, but using the old templates enables you to see the XML code and compare it to the standard and demo templates. To use the old XML templates 1. In the formats.ini file, set the following parameter. [nsfsr] UseOldTemplate=1 2. In the main.xml file, uncomment the following section. <ifini name="UseOldTemplate" text="1"> <call file="default_old.xml"> <quit> </ifini> Disable XML Templates For testing purposes, you can disable XML templates; KeyView extracts the notes in MHTML format. You can compare the MHTML output directly by the NSF reader with the MHTML output indirectly by the NSF reader through the XML templates. IDOL KeyView (12.13) Page 226 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles To disable XML templates 1. In the formats.ini file, set the following parameter. [nsfsr] ExtractByTemplate=0 Template Elements and Attributes This section lists the valid XML elements and attributes that you can use when creating or modifying templates. See the demo templates for examples. Conditional Elements The following table lists the valid conditional elements. Conditional elements Element <keyview> <if*> <ifex>, <ifnx> <ifeq>, <ifne>, <iflt>, <ifle>, <ifgt>, <ifge> <iftdeq>, <iftdne>, <iftdlt>, <iftdle>, <iftdgt>, <iftdge> Description The KeyView XML template container ("root") element If the condition from the comparison is true, process the XML. Conditions can be nested up to 25 levels deep. Attributes l name. (Required) The name of the main item to compare to item or text. l item. (Required if no text) The name of the item to compare to the item specified by name. l text. (Required if no item) The text to compare to the item specified by name. If name item exists and has a text value or not. The Notes item might have a value that cannot be converted to text, such as an image. Respectively, if text ==, !=, <, >, <=, >, >=. Text comparison uses a case-insensitive string compare. Respectively, if time/date ==, !=, <, >, <=, >, >=. Time/date comparison converts dates to text in local time using the Notes default, TZFMT_NEVER, because Notes also sometimes converts fields to text internally. For example: IDOL KeyView (12.13) Page 227 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Conditional elements, continued Element Description <iftzeq>, <iftzne> <ifini> <else> <switch> text="06/30/2005 02:52:04 PM" Respectively, if the time zone equals or does not equal the comparison text, for example CDT, EST, and so on. If the value of the INI option specified in name equals the text value. If the condition from the last <if> or <switch> was false, process XML. If a name value exists, process XML. Attributes l name. (Required) The name of the main item to compare in <case> subelements. <case> <default> <for> If the comparison condition is true, process XML, then stop processing the rest of <switch>. Attributes l text. (Required) The text to compare to the name item of <switch>. If all <case> conditions were false, process XML. This element must be the last element in <switch>, after all the <case> elements. Any <case> elements after the <default> element are ignored. If a name value exists, process XML. Process for each part of the name item. Attributes l name. (Required) The name of the main item. l max. (Optional) The maximum index to process. By default, all are processed. <index> Output <for> loop index (1-based). <index> is only valid within a <for> element. Control Elements The following table lists the valid control elements. IDOL KeyView (12.13) Page 228 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Control Elements Element Description <call> <log> <quit> <stop> Call another XML template. You can nest templates up to 10 levels deep. Attributes l file. (Required) The template file name. This name must be unique. Log message to the NSF log file. Attributes l text. (Required) The text to log. l type. (Optional) The type of log message. The following values are valid: o ERROR o WARN o INFO o DIAG (the default option) o DEBUG o DUMP Stop processing the template. Exits without error. Attributes l text. (Optional) The text to log. l type. (Optional) The type of log message. See <log>, above. Stop processing the template. Exits with an ERROR log message. Attributes l text. (Required) The text to log. Data Elements The following table lists the valid data elements. Data elements Element Description <text> Output text. Attributes l name. (Required if there is no parent) The name of the item to output. IDOL KeyView (12.13) Page 229 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Data elements, continued Element Description <rich> <body> Output rich text (MHTML). Images are output in the next part or parts of the MHTML, after the first <HTML> part. Attributes l name. (Required if there is no parent) The name of the item to output. Output the message body in rich text (MHTML). As with <rich>, above, images are output in the next part or parts of the MHTML. <form> <addr> Output the message form (usually $Body field) in rich text (MHTML). Attributes l name. (Required if there is no parent) The name of the item to output. Output an address. Attributes l name. (Required if there is no parent) The name of the item to output. l type. (Optional) The type of address to output. Set this attribute to CN (Common Name), which is the only supported type. <name> Output the name of the last name item, or in other words the current main item. The item must exist. <format> <date> Set the default format for <date> and <date_kv>. This element does not set the <text> format. See Date and Time Formats, on page 232 for a list of all Notes and KeyView date and time formats and integer values. Attributes l format. (Optional. Omit to reset to defaults) The Notes and KeyView date and time format. You can set the following formats: o TD=int. The Time Date format (TDFMT_*) o TS=int. The Time Show format (TSFMT_*) o TT=int. The Time Time format (TTFMT_*) o TZ=int. The Time Zone format (TZFMT_*) o KV=int. The KeyView date and time format where int is an integer value that corresponds to the desired format. Separate multiple formats with commas. For example: format="TD=0,TS=2,TT=1,TZ=1,KV=55" Output a Notes date. IDOL KeyView (12.13) Page 230 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Data elements, continued Element Description Attributes l name. (Required if there is no parent) The name of the item to output. l format. (Optional) See <format>, on the previous page. You can set the following values: o TD o TS o TT o TZ <date_kv> Output a KeyView date. Attributes l name. (Required if there is no parent) The name of the item to output. l format. (Optional) See <format>, on the previous page. You can set the following values: o TZ o KV <time> <zone> <zone_ utc> <logo> Output a time range, for example 1 hour, 30 minutes. Attributes l name. (Required if there is no parent) The item name of the start date or time. l item. (Required) The item name of the end date or time. Output a Notes time zone mnemonic, for example MST. Attributes l name. (Required if there is no parent) The name of date item to output. Output a time zone as UTC, for example (UTC-06:00). Output the mail header logo. The image link is included in the output; the actual image is output to a different part of the MHTML subfile. <image> Output an image. The image link is included in the output; the actual image is output to the MHTML next part, as with <rich>, on the previous page and <body>, on the previous page. IDOL KeyView (12.13) Page 231 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Data elements, continued Element Description <image_ uri> Output an image URI, in quotation marks. The actual image is output to a different part of the MHTML subfile. Attributes l link. (Required if there is no file) The image link, such as a form or title name. For example: l link="StdNotesLtr0" l file. (Required if there is no link) The name of the image file. The file must exist in the ../../templates/images directory. For example: l file="boxcheck.gif" Date and Time Formats This section lists the supported Notes and KeyView date and time formats for use with <format>, <date>, and <date_kv>. Lotus Notes Date and Time Formats This section lists supported Lotus Notes date and time formats, and the integer values that specify each one. Lotus Notes date and time formats Format Integer Value Description TDFMT_FULL 0 (The Notes default) Year, month, and day TDFMT_CPARTIAL 1 Month and day, year if not this year TDFMT_PARTIAL 2 Month and day TDFMT_DPARTIAL 3 Year and month TDFMT_FULL4 4 Four-digit year, month, and day TDFMT_ 5 CPARTIAL4 TDFMT_ 6 DPARTIAL4 Month and day, four-digit year if not this year Four-digit year and month IDOL KeyView (12.13) Page 232 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Lotus Notes date and time formats, continued Format Integer Value Description TTFMT_FULL 0 (Notes default) Hour, minute, and second TTFMT_PARTIAL 1 Hour and minute TTFMT_HOUR 2 Hour TZFMT_NEVER 0 (Notes default) All time zones are converted to the current time zone TZFMT_ 1 SOMETIMES TZFMT_ALWAYS 2 Show only when outside the current time zone Show for all time zones TSFMT_DATE 0 Date TSFMT_TIME 1 Time TSFMT_DATETIME 2 (The Notes default) Date and time TSFMT_ 4 CDATETIME Date and time, or time today or time yesterday KeyView Date and Time Formats This section lists KeyView date and time formats. The KeyView formats use the following syntax: Month Weekday Year >Day Time Month = full month name Mon = abbreviated month name m = month (number) mm = two-digit month (leading 0) Weekday = full weekday name Wday = abbreviated weekday name yy = two-digit year yyyy = four-digit year d = day (number) dd = two-digit day (leading 0) h = 12-hour H = 24-hour IDOL KeyView (12.13) Page 233 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles Separators m = minutes s = seconds P = AM/PM p = am/pm _ = space c = comma s = slash a = dash o = dot KeyView date and time formats Format Output 12-Hour and 24-Hour Time Formats KVDTF_P P KVDTF_P_hmm P h:mm KVDTF_hmm_P h:mm P KVDTF_P_hhmm P hh:mm KVDTF_hhmm_P hh:mm P KVDTF_P_hmmss P h:mm:ss KVDTF_hmmss_P h:mm:ss P KVDTF_P_hhmmss P hh:mm:ss KVDTF_hhmmss_P hh:mm:ss P KVDTF_Hmm H:mm KVDTF_HHmm HH:mm KVDTF_mmss mm:ss KVDTF_Hmmss H:mm:ss KVDTF_HHmmss HH:mm:ss Numerical Date Formats with Slashes KVDTF_mmsdd mm/dd Integer Value 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 IDOL KeyView (12.13) Page 234 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles KeyView date and time formats, continued Format Output KVDTF_msdsyy m/d/yy KVDTF_mmsddsyy mm/dd/yy KVDTF_mmsddsyyyy mm/dd/yyyy KVDTF_ddsmm dd/mm KVDTF_ddsmmsyy dd/mm/yy KVDTF_ddsmmsyy_Hmm dd/mm/yy H:mm KVDTF_ddsmm_P_hmm dd/mm P h:mm KVDTF_ddsmm_hmm_P dd/mm h:mm P KVDTF_ddsmm_P_hhmm dd/mm P hh:mm KVDTF_ddsmm_hhmm_P dd/mm hh:mm P KVDTF_ddsmmsyy_P_hmm dd/mm/yy P h:mm KVDTF_ddsmmsyy_hmm_P dd/mm/yy h:mm P KVDTF_ddsmmsyy_P_hmmss dd/mm/yy P h:mm:ss KVDTF_ddsmmsyy_hmmss_P dd/mm/yy h:mm:ss P KVDTF_ddsmmsyy_P_hhmmss dd/mm/yy P hh:mm:ss KVDTF_ddsmmsyy_hhmmss_P dd/mm/yy hh:mm:ss P KVDTF_yysmmsdd_P_hhmmss yy/mm/dd P hh:mm:ss KVDTF_yysmmsdd_hhmmss_P yy/mm/dd hh:mm:ss P KVDTF_msdsyy_Hmm m/d/yy H:mm KVDTF_mmsddsyy_Hmm mm/dd/yy H:mm KVDTF_msdsyy_P_hmm m/d/yy P h:mm KVDTF_msdsyy_hmm_P m/d/yy h:mm P KVDTF_mmsddsyy_hmm_P mm/dd/yy h:mm P KVDTF_mmsdd_P_hhmm mm/dd P hh:mm KVDTF_mmsdd_hhmm_P mm/dd hh:mm P KVDTF_mmsddsyy_P_hhmmss mm/dd/yy P hh:mm:ss Integer Value 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 IDOL KeyView (12.13) Page 235 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles KeyView date and time formats, continued Format Output Integer Value KVDTF_mmsddsyy_hhmmss_P mm/dd/yy hh:mm:ss P 42 KVDTF_msd m/d 43 KVDTF_yysm yy/m 44 KVDTF_yysmm yy/mm 45 KVDTF_yysmsd yy/m/d 46 KVDTF_yysmmsdd yy/mm/dd 47 KVDTF_yyyysmmsdd yyyy/mm/dd 48 Numerical Date Formats with Dashes KVDTF_ddammayy dd-mm-yy 49 KVDTF_mmadd mm-dd 50 KVDTF_mmayy mm-yy 51 KVDTF_yyammadd yy-mm-dd 52 KVDTF_yyyyammadd yyyy-mm-dd 53 KVDTF_yyyyammaddaHHmmss yyyy-mm-dd-HH:mm:ss 54 Numerical Date Formats with Dots KVDTF_yyomod yy.m.d 55 KVDTF_yyommodd yy.mm.dd 56 KVDTF_mod m.d 57 KVDTF_mmodd mm.dd 58 Numerical and String Date Formats with Dashes, Commas, and Spaces KVDTF_ddaMon dd-Mon 59 KVDTF_daMonayy d-Mon-yy 60 KVDTF_ddaMonayy dd-Mon-yy 61 KVDTF_ddaMonayyyy dd-Mon-yyyy 62 KVDTF_Mon Mon 63 KVDTF_Monayy Mon-yy 64 IDOL KeyView (12.13) Page 236 of 284 Filter SDK Java Programming Guide Appendix E: Extract and Format Lotus Notes Subfiles KeyView date and time formats, continued Format Output Integer Value KVDTF_Monayyyy Mon-yyyy 65 KVDTF_Monaddayy Mon-dd-yy 66 KVDTF_yyammadd_P_hhmmss yy-mm-dd P hh:mm:ss 67 KVDTF_mmadd_P_hhmm mm-dd P hh:mm 68 KVDTF_Mon_yy Mon yy 69 KVDTF_Monc_yy Mon, yy 70 KVDTF_Month Month 71 KVDTF_Monthayy Month-yy 72 KVDTF_Month_yy Month yy 73 KVDTF_Monthc_yy Month, yy 74 KVDTF_Monthayyyy Month-yyyy 75 KVDTF_Month_yyyy Month yyyy 76 KVDTF_Monthc_yyyy Month, yyyy 77 KVDTF_Mon_dc_yyyy Mon d, yyyy 78 KVDTF_d_Monc_yyyy d Mon, yyyy 79 KVDTF_yyyy_Mon_d yyyy Mon d 80 KVDTF_Month_dc_yyyy Month d, yyyy 81 KVDTF_d_Monthc_yyyy d Month, yyyy 82 KVDTF_yyyy_Month_d yyyy Month d 83 Weekday Date Formats KVDTF_Wday Wday 84 KVDTF_Weekday Weekday 85 KVDTF_Wdayc_Mon_dc_yyyy Wday, Mon d, yyyy 86 KVDTF_Weekdayc_Month_dc_yyyy Weekday, Month d, yyyy 87 KVDTF_Weekdayc_d_Monthc_yyyy Weekday, d Month, yyyy 88 IDOL KeyView (12.13) Page 237 of 284 Appendix F: File Format Detection This section describes how file formats are detected in Filter SDK. · Introduction 238 · Extract Format Information 238 · Determine Format Support 238 · Translate Format Information 241 · Determine a Document Reader 242 · Additional Format Information 242 Introduction The KeyView format detection module (kwad) detects a file's format, and reports the information to the API, which in turn reports the information to the developer's application. If the detected format is supported by the KeyView SDK, the detection module also loads the appropriate structured access layer and document reader for further processing. For a list of supported formats, see Document Readers, on page 177. Extract Format Information You can extract format information from a document by using one of the getDocFormatInfo methods. These methods extract the major format, file class, version, and document attributes, and populate the DocFormatInfo class. They return the format information as a string. The format information that you can extract is listed in the header file adinfo.h. For information on how to translate the extracted format information, see Translate Format Information, on page 241. Determine Format Support After the file format is extracted, the detection module uses the formats.ini file to determine whether the format is supported by KeyView, and the appropriate structured access layer and reader to load. IDOL KeyView (12.13) Page 238 of 284 Filter SDK Java Programming Guide Appendix F: File Format Detection The formats.ini file is in the directory install\OS\bin, where install is the path name of the Filter installation directory and OS is the name of the operating system. It contains the following information: l Coded format information. To translate this information, see Translate Format Information, on page 241. l The reader associated with each format. See Determine a Document Reader, on page 242. l Configuration parameters. l Locale settings for internal use. Example formats.ini file entries 123=mw 152=xyw 178=wp6 189=mw6 2=af 200=pdf 205=mb 210=htm 251=htm NOTE: The formats.ini file applies to all formats except graphics. Detection of graphics formats is handled by an internal module named KeyView Picture Interchange Format (KPIF). Refine Detection of Text Files During text detection, KeyView analyses the first 1 kB and last 1 kB of data in a document. If less than 10% of that data consists of non-ASCII characters, KeyView detects the document as a text file. However, depending on the type of documents you are working with, the default settings might not provide the desired level of accuracy. Configuration flags enable you to change the amount of data to read at the end of a file, the percentage of non-ASCII characters permitted in a text file, and whether to use or ignore the file extension to determine the document format. Change the Amount of File Data to Read During file detection, KeyView reads characters from the beginning and end of a file--by default, it reads the first and last 1,024 bytes of data. Large text files might contain many irrelevant characters at the end of a file, so KeyView might not accurately detect the file format. You can set a configuration flag to increase the amount of data to read from the end of a file during detection. IDOL KeyView (12.13) Page 239 of 284 Filter SDK Java Programming Guide Appendix F: File Format Detection To change the amount of data to read during detection l In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] non_ascii_chars_end_block_size=kB where kB is the number of kilobytes to read from the end of the file, from 0 to 10. The default value is 1. NOTE: The file size must be greater than the value specified in the flag. If the flag value is greater than the file size, KeyView does not use the flag. Change the Percentage of Allowed Non-ASCII Characters By default, if less than 10% of the analyzed data in a document consists of non-ASCII characters, it is detected as a text file. Depending on the type of files that you are working with, changing the default percentage might increase detection accuracy. To change the percentage of non-ASCII characters allowed in text files l In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] non_ascii_chars_in_text=N where N is the percentage of non-ASCII characters to allow in text files. Files that contain a lower percentage of non-ASCII characters than N are detected as text files. The default value is 10. Allow Consecutive NULL Bytes in a Text File By default, if a document contains consecutive NULL bytes, it is not detected as text. Depending on the type of files that you are working with, changing the default might increase detection accuracy. To allow consecutive NULL bytes of ASCII characters in text files In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] ascii_allow_null_bytes=1 The default value is 0 (do not allow consecutive NULL bytes). Use the File Extension for Detection Sometimes KeyView detects certain file formats, such as CSV, as ASCII because of the content of the documents. In such cases, you can configure KeyView to use the file extension to determine the document format. Using the file extension can improve detection of formats such as CSV, but might not detect text files successfully if they have incorrect file extensions. IDOL KeyView (12.13) Page 240 of 284 Filter SDK Java Programming Guide Appendix F: File Format Detection To use the file extension for ASCII files during detection l In the formats.ini file, set the following flag in the detection_flags section: [detection_flags] use_extension_for_ascii=1 The default is 0 (do not use the file extension). Translate Format Information Format information can include file attributes in the following categories: l Major format l File class l Minor format l Major version l Minor version Not all categories are required. Many formats only include major format and file class, or major format only. The format information has the following structure: MajorFormat.FileClass.MinorFormat.MajorVersion.MinorVersion For example: 81.2.0.9.0 Each number in the format information represents a file attribute. The entry 81.2.0.9.0 represents a Lotus 1-2-3 Spreadsheet file version 9.0, where 81= Lotus 1-2-3 Spreadsheet (major format) 2 = Spreadsheet (file class) 0 = not defined (minor format) 9 = 9 (major version) 0 = 0 (minor version) This example applies to the formats.ini file. When extracting format information using the getDocFormatInfo methods, the same format is represented as 294.2.9.0. NOTE: The format values returned from getDocFormatInfo differ from those in formats.ini because the former defines a unique ID for each major format, while the latter uses a major version, minor version, and minor format to distinguish between formats. IDOL KeyView (12.13) Page 241 of 284 Filter SDK Java Programming Guide Appendix F: File Format Detection Distinguish Between Formats The DocFormatInfo class provides a unique ID for each major format. For example, a call to getDocFormatInfo would return 351.1.0 for a Microsoft Word XML format. The major format 351 is unique to this format. Unlike DocFormatInfo, the formats.ini file distinguishes between formats by using the major version number. For example, in the formats.ini file, a Microsoft Word 2003 XML format is defined as 285.1.0.100.0. The major format 285 and file class 1 are the same values for generic XML. The major version 100 distinguishes the format as Microsoft Word 2003 XML. The major version is used to specify the following formats: l Microsoft Office 2003 XML. This format has the same major format and file class as generic XML (285.1). It is distinguished from generic XML by using the following major versions: o Word: 100 o Excel: 101 o Visio: 110 l The XHTML format has the same major format and file class as HTML (210.1). It is distinguished from HTML by using the major version 100. Determine a Document Reader The format detection module uses the formats.ini file to determine whether a format is supported, and to determine the reader to use to parse a format. The entries in the formats.ini file list each format's coded value, and an abbreviation for the format's reader. The reader abbreviation is a truncated version of the reader's library name. Adding "sr" to the end of an abbreviation creates the name of the reader. For example, this example entry specifies that a Lotus 1-2-3 Spreadsheet file version 9.0 is parsed by the Lotus 1-2-3 filter, l123sr: 81.2.0.9.0=l123 List of Required Files for Redistribution, on page 243 lists the readers provided with KeyView. Additional Format Information The ADDOCINFO class returns basic information about a document's format, but sometimes it can be useful to have additional information. The file formats_description.tsv, which can be found in the bin directory, provides a mapping between file format ID, human-readable format description, and the format's MIME type (if one exists). This file is in tab-delimited format, and the tab character will only appear as a delimiter. This information is available in the documentation (see the section Supported Formats, on page 107), but the TSV file provides it in a machine-readable format. IDOL KeyView (12.13) Page 242 of 284 Appendix G: List of Required Files for Redistribution This section lists the Filter files that can be redistributed in your applications under the licensing agreement. Unless noted, these files are in the directory install\OS\bin, where install is the path of the Filter installation directory and OS is the operating system platform. NOTE: On Windows systems, the libraries are .dll files. On UNIX systems, the libraries are .so, .a, or .sl files. Core Files The following core files can be redistributed with your application. File formats.ini FilterDotNet.dll filterfordotnet.dll KeyView.jar Description Initialization file. For more information on this file, see Determine Format Support, on page 238. The .NET API. Required by the .NET API. The Java API. NOTE: This file can be found at the path install/javaapi/KeyView.jar where install is the Filter SDK installation directory. *KeyViewFilter.* kpifcnvt.* kpifutil.* kvfilter_nsl.a kvxtract.* kvfilter.* kvolefio.* Required by the Java API. For presentation graphics, converts from one picture format to another. Utility for handling the internal picture interchange format for presentation graphics. (AIX platforms only.) Alternative Filter API implementation using POSIX standards for starting new processes. See The Filter Process Model, on page 25. File Extraction API. Filter API. Embedded OLE object writer. IDOL KeyView (12.13) Page 243 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File kvutil.* kvxpgsa.* kvxsssa.* kvxwpsa.* kvzip.* kwad.* txtcnv.* vcredist\* Description Internal KeyView utility functions. Interface between presentation readers and kvfilter. Required to extract metadata from AutoCAD files. Interface between spreadsheet readers and kvfilter. Interface between word processing readers and kvfilter. Zip writer. File auto-recognition module. Converter for document token stream. (Windows platforms only) Microsoft Visual C++ Redistributable Packages. For more information about these files, see Software Dependencies, on page 14. NOTE: The vcredist folder is located at the root of the SDK, and not in the bin directory. Support Files The following support files can be redistributed with your application. File datafiles\* NSFtemplates\* 7z.* bentofio.* cbmap.map CEBDLL.dll chartbls.ux chmdll.* codeidentifierplugin.* cpstsdk.* *cryptographyservices.* Description (Folder) Required by kvlangdetect (Folder) Templates used by nsfsr to format Lotus mail notes Required by z7zsr and multiarcsr Required by l123sr and kpprzrdr. Character mappings for Adobe Portable Document Format (PDF). Required by cebsr. Character mappings. Required by chmsr. Required for source code identification Required by pstxsr. Used for decrypting the data in RMS protected documents when IDOL KeyView (12.13) Page 244 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File DFECore.dll Filter.dll kpbmpwrt.* kppng.* kvdecrypt.* kvlangdetect.* kvxconfig.ini kvoop.* kvthread.* kv.lic *langdetectext.* libpff.* libcrypto* libstlport.so.1 tabledata.dat unzipjpg.* wpmap.* xmlsh.* Description credentials have been configured. Required by cebsr. Required by cebsr. Required for processing bmp files. Required for ZLIB decompression. Decryption utility functions. Utility functions for language and character set detection. Contains element extraction settings for XML files. Required for out-of-process filtering. Required for multithreaded out-of-process filtering. Contains license information for KeyView products. This file is opened and validated when a KeyView API is used. Required by kvlangdetect.*. Required by pffsr. SSL utility functions used by KeyView mail format readers. (Solaris platforms only) Solaris Studio Redistributable. This file is located in install/OS/lib. Required for table detection. Required for JPEG decompression. Extended character mapping for WordPerfect and Corel Presentation. Contains a library of content handlers for each XML file type. Required by the Expat XML parser. Document Readers The following readers can be redistributed with your application. File ad1sr.* Description AD1 Evidence file reader IDOL KeyView (12.13) Page 245 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File afsr.* aiffsr.* asfsr.* assr.* awsr.* b1sr.* bkfsr.* bmpsr.* bzip2sr.* cabsr.* cebsr.* chmsr.* csvsr.* dbfsr.* dbxsr.* dcasr.* dcmsr.* difsr.* dmgsr.* dw4sr.* dxlsr.* emlsr.* emxsr.* encasesr.* encase2sr.* entsr.* epubsr.* Description ASCII reader Audio Interchange Format File (AIFF) reader Advanced Systems Format reader Applix Spreadsheet reader Applix Word reader B1 archive reader Microsoft Backup File reader Windows bitmap (BMP) reader Bzip2 reader Microsoft Cabinet format reader Founder Chinese E-paper Basic reader Microsoft Compiled HTML Help reader Comma-Separated Values reader dBase Database reader Microsoft Outlook Express DBX reader Document Content Architecture/Revisable Form Text (DCA/RFT) reader Digital Imaging and Communications in Medicine (DICOM) reader Data Interchange Format reader Mac Disk Copy Disk Image File reader DisplayWrite reader Domino XML Language reader Microsoft Outlook Express (EML) reader. This is used to filter EML files when the MBX reader is not licensed. Legato EMailXtender (EMX) reader Expert Witness Compression Format (EnCase) v6 reader Expert Witness Compression Format (EnCase) v7 reader Microsoft Entourage Database Format reader Open Publication Structure eBook reader IDOL KeyView (12.13) Page 246 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File foliosr.* gdsiisr.* gifsr.* gwfssr.* hl7sr.* htmsr.* hwpsr.* hwposr.* ichatsr.* icssr.* isosr.* iwss13sr.* iwwp13sr.* iwwpsr.* iwsssr.* jp2000sr.* jpgsr.* jtdsr.* kpagrdr.* kpcatrdr.* kpcgmrdr.* kpdwgrdr.* kpdxfrdr.* kpemfrdr.* kpgflrdr.* kpgifrdr.* kpiwpg13rdr.* kpiwpgrdr.* Description Folio Flat File reader Graphic Database System (GDSII) reader Graphics Interchange Format (GIF) reader GroupWise FileSurf reader Health level7 reader (metadata only) HTML and XHTML reader Hangul 97 reader Hangul 2002, 2005, 2007 reader Apple iChat Log reader Microsoft Outlook iCalendar reader ISO-9660 CD Disc Image Format reader iWork 13 Numbers reader iWork 13 Pages reader Apple iWork Pages reader Apple iWork Numbers reader JPEG 2000 metadata reader JPEG metadata reader JustSystems Ichitaro reader Applix Presentations reader CATIA format reader Computer Graphics Metafile reader AutoCAD Drawing format reader AutoCAD Drawing Exchange format reader Enhanced Metafile reader Omni Graffle reader Graphic Interchange Format (GIF) reader iWork 13 keynote reader Apple iWork Keynote reader IDOL KeyView (12.13) Page 247 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File kpjbig2rdr.* kpjp2000rdr.* kpmsordr.* kpnbmprdr.* kpodardr.* kpodfrdr.* kpoxdrdr.* kpp40rdr.* kpp95rdr.* kpp97rdr.* kppctrdr.* kppicrdr.* kppngwrt.* kpppxrdr.* kpprerdr.* kpprzrdr.* kpsddrdr.* kpsdwrdr.* kpshwrdr.* kptifrdr.* kpugrdr.* kpvsd2rdr.* kpvsdxrdr.* kpwg2rdr.* kpwmfrdr.* kpwpgrdr.* kpxfdlrdr.* kvgzsr.* Description JBIG2 reader JPEG 2000 reader Microsoft Office Drawing Objects (office 97, 2000, and XP) reader Notes Bitmap reader (for embedded images in DXL files) AutoCAD reader Oasis Open Document Format presentation (ODP) reader Open Office XML Diagram Graphics reader. Microsoft PowerPoint PC 4.0 and PowerPoint Mac reader Microsoft PowerPoint 95 reader Microsoft PowerPoint 97 and higher reader Macintosh Quick Draw Picture (PICT) reader Pictor PC Paint (PIC) reader Portable Network Graphics (PNG) reader Microsoft PowerPoint XML reader 2007 Lotus Freelance Graphics for Windows V2.0 reader Lotus Freelance Graphics 96/97/98 reader StarOffice Impress reader Lotus Ami Pro Graphics reader Corel Presentations reader Tagged Image File (TIF) reader Unigraphics (UG) NX reader Microsoft Visio reader Microsoft Visio 2013 reader WordPerfect Graphics 2 reader Windows Metafile reader WordPerfect Graphics 1 reader Extensible Forms Description Language reader GZIP reader IDOL KeyView (12.13) Page 248 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File kvhqxsr.* kvzeesr.* l123sr.* lasr.* ltbenn30.dll ltscsn10.dll lwpapin.dll lwppann.dll lwpsr.dll lzhsr.* macbinsr.* mbsr.* mbxsr.* mdbsr.* mhtsr.* mifsr.* misr.* mp3sr.* mpeg4sr.* mppsr.* msgsr.* mspubsr.* msw6sr.* mswsr.* multiarcsr.* mw6sr.* Description BinHex reader UNIX Compress reader Lotus 123 v96/97/98 reader Lotus AMI Pro reader Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro support (supported on Windows x86 platform only) Lotus Word Pro reader (supported on Windows x86 platform only) Microsoft Compression Folder reader MacBinary reader Microsoft Word Macintosh reader Mailbox (MBX) and Microsoft Outlook Express (EML) reader1 Microsoft Access reader MIME HTML reader Adobe Maker Interchange reader Microsoft Word 2 reader MP3 reader for metadata extraction reader MPEG-4 Audio file reader Microsoft Project reader Microsoft Outlook (MSG) reader Microsoft Publisher reader Microsoft Works 6 and 2000 reader Microsoft Works V1 and 2 reader ARJ Reader Microsoft Word 95 reader 1This reader is an advanced feature and is sold and licensed separately from KeyView Filter SDK. See License Information, on page 17 IDOL KeyView (12.13) Page 249 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File mw8sr.* mwsr.* mwssr.* mwxsr.* nsfsr.* oa2sr.* odfsssr.* odfwpsr.* olesr.* olmsr.* onealtsr.* onesr.* pmesr.* onmsr.* oo3sr.* pbixsr.* pdf2sr.* pdfsr.* pfilesr.* pffsr.* pngsr.* psdsr.* pstsr.dll pstnsr.* pstxsr.* qpssr.* qpwsr.* Description Microsoft Word 97, 2000, and XP reader Microsoft Word for DOS and Microsoft Write reader Microsoft Works Spreadsheet reader Microsoft Word 2007 XML reader Lotus Notes database reader 1 Fujitsu Oasys reader Oasis Open Document Format spreadsheets (ODS) reader Oasis Open Document Format word processing (ODS) reader Embedded OLE object reader Microsoft Outlook for Macintosh reader Microsoft OneNote Alternate Format reader Microsoft OneNote Format reader Plazmic Media Engine data file reader Legato EMailXtender Native Message reader Omni Outliner reader Microsoft Power BI file (PBIX) reader Alternative Adobe Portable Document Format file (PDF) reader Adobe Portable Document Format file (PDF) reader Microsoft Rights Management System encryption file reader Microsoft Outlook Offline Storage File reader Portable Network Graphics (PNG) reader Adobe Photoshop Document (PSD) reader Microsoft Outlook Personal Folders file MAPI-based reader (supported on Windows platform only)1 Microsoft Outlook Personal Folders file native reader1 Microsoft Outlook Personal Folders file native reader1 Corel Quattro Pro spreadsheet reader Corel Quattro Pro version X4 spreadsheet reader IDOL KeyView (12.13) Page 250 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File rarsr.* riffsr.* rtfsr.* skypesr.* sosr.* starcsr.* starwsr.* sunadsr.* swfsr.* tarsr.* tifsr.* tnefsr.* unihtmsr.* unisr.* unzip.* utf8sr.* uudsr.* vcfsr.* vsdsr.* wkssr.* wosr.* wp6sr.* wpmsr.* xlsbsr.* xlssr.* xlsxsr.* xmlsr.* xpssr.* Description RAR Archive reader Microsoft WAVE reader Microsoft Rich Text reader Skype log file reader StarOffice/OpenOffice reader StarOffice Calc reader StarOffice Writer reader Sun Audio Data reader Macromedia Flash reader Tape archive reader TIFF reader (metadata only) Transfer Neutral Encapsulation Format Unicode HTML reader Unicode reader Zip file reader UTF-8 reader UUEncoding reader Microsoft Outlook vCard Contact reader Microsoft Visio reader Lotus 123 v2.0 through 5.0 reader WordPerfect 5.x reader WordPerfect 6.0 through 10.0 reader WordPerfect for Macintosh reader Microsoft Office 2007 Excel Binary Format reader Microsoft Excel reader Microsoft Excel 2007 XML reader Generic XML reader XML Paper Specification reader IDOL KeyView (12.13) Page 251 of 284 Filter SDK Java Programming Guide Appendix G: List of Required Files for Redistribution File xywsr.* yimsr.* z7zsr.* Description XYWrite reader Yahoo! Instant Messenger reader 7-Zip reader IDOL KeyView (12.13) Page 252 of 284 Appendix H: Develop a Custom Reader This section describes how to develop a reader for a format not supported by KeyView. · Introduction 253 · How to Write a Custom Reader 254 · Development Tips 264 · Functions 265 Introduction The Filter SDK enables you to write custom readers for formats not directly supported by KeyView. A reader is required to parse the file format and generate a KeyView token stream, which represents the content and format of the document. Filter can then use this token stream to generate a text version of the original document. The readers interact with a structured access layer and a writer to generate a text file in Filter, an HTML file in HTML Export, an XML file in XML Export, and a near-tooriginal view of the document in the Viewing SDK. The complexity of a custom reader depends on the file format used by the source document type. A simple reader extracts only the textual content, but ignores formatting and all other non-textual content. Readers of increasing complexity must address one or more of the following: l formatting (including fonts, foreground and background colors, paragraph borders and shading, character and paragraph styles) l tables l lists l headers l footers l footnotes l endnotes l graphics l bookmarks to internal links l hyperlinks to external documents or webpages l other structures, such as a table of contents or index Even a simple reader might have to parse the following components of a document: l word processing commands or tags l encrypted or encoded text IDOL KeyView (12.13) Page 253 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader l multiple character sets l text modified, but retained within the file l text displayed in an order other than its physical occurrence within the source file It is very important to fully understand the file specification for the file format used by the document. This is essential in determining how to parse the source file and generate a token stream that accurately and effectively represents the original document. Within Filter, the custom reader must interact with a structured access layer and the format detection API, which in turn interacts with the top-level API. For a description of the Filter architecture, see Architectural Overview, on page 20. The custom reader must have a module definition file (*.def) that defines the exported API function calls. In addition, the formats.ini file must be modified to identify the custom reader and its associated format detection function. See the source code for the sample custom reader (utf8sr), which parses plain text files encoded in UTF-8. The source code is in the directory install/samples/utf8sr, where install is the path name of the Filter installation directory. How to Write a Custom Reader Two include files define the requirements for a custom reader: kvcfsr.h and kvtoken.h. The definitions of the KeyView tokens are in kvtoken.h. For more information on tokens, see Token Buffer, on the next page. The file kvcfsr.h defines two structures: TPReaderInterface and adTPDocInfo. The TPReaderInterface structure defines the API functions implemented by the custom reader. For basic readers, only the first four functions must be implemented. These functions are called by the structured access layer to parse the source file and generate the token stream. All readers must be threadsafe. This means that global variables must not be used. To pass information between functions, it is necessary to define a "global" context structure that stores all information required throughout the life of the DLL. The initial parameter of all but one of the TPReaderInterface functions is a pointer to a global context structure defined for the custom reader. The adTPDocInfo structure defines the information required for the format detection API, which associates the custom reader with the required file format. Naming Conventions Use the following naming conventions for functions and files: l The initial letters of the custom reader file name should identify the file format being parsed. For example, pdf for Adobe PDF files, rtf for RTF files, and xls for Microsoft Excel files. In the examples in this appendix, this is represented by xxx. l The name of the shared library must end with the letters sr. IDOL KeyView (12.13) Page 254 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader l The name of the exported functions in the module definition file must be xxxGetReaderInterface and xxxsrAutoDet. NOTE: The letters sr are excluded from xxxGetReaderInterface, but are included in xxxsrAutoDet. Basic Steps The basic steps for developing a custom reader are as follows. To develop a custom reader 1. Design the global context structure. 2. Write the basic API functions: l xxxAllocateContext() l xxxInitDoc() l xxxFillBuffer() l xxxFreeContext() l xxxCharSet() l xxxsrAutoDet() From within the xxxFillBuffer() function, it is necessary to call other functions that repeatedly read a chunk of a source file, parse the chunk, and generate a token stream until the entire source file is processed. 3. Map all but the last function to the TPReaderInterface structure. 4. Write the module definition file (*.def), exporting the reader interface and format detection functions. 5. Modify the formats.ini file to identify the custom reader and its associated format detection function. See xxxsrAutoDet(), on page 265. For example, the following lines would be added to the [Formats] section of the formats.ini file for the UTF-8 reader: 456.1.0.0=utf8 [CustomFilters] 1=utf8sr Token Buffer Filter technology parses the native file structure to generate an intermediate stream called a token buffer. The token buffer consists of multiple sequences of tokens, which are defined in kvtoken.h and listed below. #define KVT_TEXT #define KVT_PARAINFO #define KVT_SETTABS 0x00 /* PutText() */ 0x01 /* SetParaInfo() */ 0x02 /* SetTabs() */ IDOL KeyView (12.13) Page 255 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader #define KVT_TAB 0x03 /* Tab() */ #define KVT_MODE 0x04 /* SetMode() */ #define KVT_PARASPACE 0x05 /* SetParaSpace() */ #define KVT_ROWDEFN 0x06 /* DefineRow(), EndTable() */ #define KVT_COLUMNS 0x07 /* StartColumns(), etc. */ #define KVT_CELLSTART 0x08 /* NextCell() */ #define KVT_BITMAP 0x09 /* Reserved for annotations. */ #define KVT_PAGEOBJ 0x0A /* PutHeader(), PrintPage(), etc.*/ #define KVT_NOOP 0x0B /* Just skip a BYTE. */ #define KVT_PAGE_BREAK 0x0C /* PageBreak() */ #define KVT_PARA_BREAK 0x0D /* ParaEnd() */ #define KVT_LINE_BREAK 0x0E /* LineBreak() */ #define KVT_SET_FONT 0x0F /* SetFont() */ #define KVT_PAGE 0x10 /* SetPageInfo() */ #define KVT_HOTSPOT 0x11 /* StartHotSpot() */ #define KVT_LINESPACE 0x12 /* SetLineSpacing() */ #define KVT_COLOR 0x13 /* VESetTextColor(),VESetBkColor()*/ #define KVT_PICTURE 0x14 /* PutPicture() */ #define KVT_CELLMERGE 0x15 /* MergeCells() */ #define KVT_RULE 0x16 /* HorzRule() */ #define KVT_PATTERN 0x17 /* StartPattern(), etc. */ #define KVT_BORDER 0x18 /* StartParaBorder(), etc. */ #define KVT_HEADING 0x19 /* PutParaHeading() */ #define KVT_LISTING 0x1A /* StartList(), etc. */ #define KVT_CHARSET 0x1B /* SetCharSet() */ #define KVT_STYLE 0x1C /* PutCharStyle(), PutParaStyle()*/ #define KVT_BIDI 0x1D /* Set Bidirectional text */ #define KVT_LOCALE 0x1E /* Set locale of a document */ #define KVT_ZONE 0x1F /* StartZone(), EndZone() */ #define KVT_POSITION 0x20 /* SetPosition(), etc. */ #define KVT_AUTOREC 0x21 /* Reserved for Internal Use */ #define KVT_METADATA 0x22 /* Rsserved for Internal Use */ #define KVT_BYTEORDER 0x23 /* SetByteOrder() */ #define KVT_PARASPACEAUTO 0x24 /* SetParaSpaceAuto() */ #define KVT_ATTACH 0x25 /* PutAttachment() */ #define KVT_TOCPRINTIMAGE 0x26 /* StartTOCPrintImage(), etc. */ #define KVT_STREAM 0x27 /* PutStream(),Reserved */ #define KVT_REVISIONMARK 0x28 /* StartRevisionMark(), EndRevisionMark(), SetRMAuthor(), SetRMDateTime() */ #define KVT_DOCXTRINFO 0x29 /* SetDocXtrInfo() */ #define KVT_PCTEMDFT 0x30 /* SetPctEmdFt() */ A token is a single-byte identifier that corresponds to attributes in a document. Each token has one or more associated macros that provide detailed information about an attribute. Many of these tokens define components of the document, such as page margins, line indentation, and foreground and background color. Collectively, these are referred to as the state of the document. This state changes as the document is parsed. IDOL KeyView (12.13) Page 256 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Macros Some of the macros are simple while others are complicated. An example of a simple macro is ParaEnd (pcBuf) which terminates the current paragraph. #define ParaEnd(pcBuf) \ { \ *pcBuf++ = KVT_PARA_BREAK; \ KVT_PUTINT(pcBuf, KVTSIZE_PARA_BREAK); \ } In Filter SDK, this generates an 0x0d, 0x0a pair of bytes on a Windows machine. In HTML Export this can generate a <p style="..."> element, depending on the value of other paragraph attributes. One of the more complicated macros is PutPictureEx(). #define PutPictureEx(pcBuf, lpszKey, cx, cy, flags, \ scaleHeight, scaleWidth, \ cropFromL, cropFromT, cropFromR, cropFromB, \ anchorHorizontal, anchorVertical, offsetX, offsetY)\ { \ PutPic(pcBuf, lpszKey, cx, cy, flags, \ scaleHeight, scaleWidth, \ cropFromL, cropFromT, cropFromR, cropFromB, \ anchorHorizontal, anchorVertical, offsetX, offsetY,\ 180, 0, 180, 0, -1, 0, 0, 0, 0) \ } You can generate a representation of the token stream by running filtertest.exe with the -d command-line option. This stream does not include the tokens generated for headers or footers. The filtertest.exe is in the directory install\samples\utf8\bin, where install is the path name of the Filter installation directory. Reader Interface All custom readers use the reader interface defined in kvcfsr.h. The members of this structure are: fpAllocateContext() fpInitDoc() fpFillBuffer() fpFreeContext() fpHotSpothit() fpGetSummaryInfo() fpOpenStream() fpCloseStream() fpGetURL() fpGetCharSet() NOTE: fpHotSpothit() and fpGetURL() are currently reserved and must be NULL. IDOL KeyView (12.13) Page 257 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Function Flow The structured access layer calls the functions as follows: 1. fpAllocateContext() is called and returns a pointer to the global context structure. 2. After further processing within the structured access layer, fpInitDoc() is called. This function performs all required initialization for the global context structure and then returns control to the structured access layer. 3. After further processing within the structured access layer, the fpFillBuffer() function is called repeatedly until the document is completely parsed. 4. Finally, fpFreeContext() is called. This function frees all memory allocated within the custom reader and then returns control to the structured access layer. Related Topics l Functions, on page 265 Example Development of fffFillBuffer() The following is an example of how the fpFillBuffer() function in foliosr could be developed. The example demonstrates how the code changes as limitations of the implementation are identified. With each implementation, code revisions are shown in bold. Implementation 1--fpFillBuffer() Function /***************************************************************** *Function: fffFillBuffer() *Summary: Read fff input from stream and parse into kvtoken.h codes *****************************************************************/ int pascal _export fffFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { BOOL bRetVal; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; fffReadSourceFile(pContext); bRetVal = fffProcessBuffer(pContext, pcBuf); *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); } The parameters in fffFillBuffer() are as follows: IDOL KeyView (12.13) Page 258 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Parameter pCFContext pcBuf pnBufOut pnPercentDone cbBufOutMax In/Out In In/Out Out Out In Description A pointer to the context structure of the custom reader. A pointer to the token output buffer. A pointer to the number of bytes written to the output buffer. A pointer to the percentage complete. The maximum number of bytes that the token output buffer can hold. Structure of Implementation 1 1. The local variable pContext is set to the address of the pCFContext void pointer, cast to a pointer to the global context structure for the reader. This provides access to all members of this structure. 2. After setting the pContext variable, a call is made to read the source file. 3. Next, a call is made to fffProcessBuffer(). The second parameter in the call is a pointer to the token output buffer. If this call fails, usually because of memory allocation errors, it returns FALSE. 4. The percentage complete is calculated. 5. The number of BYTES written to the token output buffer is calculated. This is based on the value of pContext->pcBufOut, which is increased each time a token is written to the buffer. 6. The function returns to the structured access layer. 7. Subsequent calls to fffFillBuffer() are made by the structured access layer until the percentage complete is 100. Problems with Implementation 1 l There is a limit to the size of the token output buffer, typically 4 KB. If fffProcessBuffer() generates a token stream larger than this, there is a memory overflow. If fffProcessBuffer() generates a small token stream and the entire file has not been read, the output token buffer is underutilized. l It might not be possible to process the entire input buffer from the source file because of boundary conditions. An example of a "boundary condition" is when the input buffer terminates part way through a control sequence in the original document. Another file read operation is required before the complete control sequence can be parsed. l This function might be interrupted by other calls from the structured access layer to process headers, footers, footnotes, and endnotes, or to retrieve the document summary information. This can cause values of variables in the global context to change, and the source file to be repositioned. IDOL KeyView (12.13) Page 259 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Implementation 2--Processing a Large Token Stream Implementation 2 addresses the problem of processing a token stream that is larger than the output buffer size limit. /***************************************************************** * Function: fffFillBuffer() * Summary: Read fff input from stream and parse into kvtoken.h codes *****************************************************************/ int pascal _export fffFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { BOOL bRetVal = TRUE; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; pContext->cbBufOutMax = 9 * cbBufOutMax / 10; /* Process the portion of the fff file that is in the input buffer but do * not return from the fffFillBuffer() function unless the output buffer is * at least 90% full. If any of the memory allocations fail during the * execution of fffProcessBuffer(), bRetVal will be set to FALSE, resulting * in this conversion failing "gracefully". */ do { if( pContext->bBufOutFull ) { pContext->bBufOutFull = FALSE; } else { fffReadSourceFile(pContext); } bRetVal = fffProcessBuffer(pContext, pcBuf); *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); }while( bRetVal && !pContext->bBufOutFull && *pnPercentDone < 100 ); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); } Structure of Implementation 2 1. cbBufOutMax is used to set pContext->cbBufOutMax. This is used in fffProcessBuffer() to monitor how full the token output buffer becomes as the source file is processed. IDOL KeyView (12.13) Page 260 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader 2. When the source file input buffer has been processed, fffProcessBuffer() returns, and the percentage complete is calculated. 3. If the token output buffer is not filled to a value greater than pContext->cbBufOutMax, pContext->bBufOutFull remains set to FALSE, and if the percentage complete is less than 100, the do-while loop is re-entered without returning from this function to the structured access layer. There is another call to fffReadSourceFile(), followed by fffProcessBuffer(). 4. When the token output buffer is filled to a value greater than pContext->cbBufOutMax, pContext->bBufOutFull is set to TRUE. In this case, the do-while loop ends, the number of bytes written to the token output buffer is calculated, and control returns to the structured access layer. 5. The structured access layer continues to make calls to fffFillBuffer() until the entire source file is processed. 6. Each time the structured access layer calls fffFillBuffer(), another empty token output buffer is provided for the custom reader to use. 7. If the previous call to fffFillBuffer() exited because the previous token output buffer exceeded allowable capacity, pContext->bBufOutFull is reset to FALSE and no call is made to read the next buffer from the input source file. Problems with Implementation 2 l It might not be possible to process the entire input buffer from the source file because of boundary conditions. l This function might be interrupted by other calls from the structured access layer to process headers, footers, footnotes, or endnotes, or to retrieve the document summary information. This can cause values of variables in the global context to change, and the source file to be repositioned. Boundary Conditions A boundary condition can result from many situations arising from input file processing. For example, the input buffer might end with an incomplete command. In Folio flat files, this could be an incomplete element. In other word processing documents, a boundary condition might result from an incomplete control sequence, a split double-byte character, or a partial UTF-7 or UTF-8 sequence. These can be handled jointly by fffProcessBuffer(), which must detect the boundary condition, and fffReadSourceFile(). The following example shows partial code used in fffReadSourceFile(): /**************************************************************** * * Function: fffReadSourceFile() * ***************************************************************/ int pascal fffReadSourceFile(TPfffGlobals *pContext) { int nBytes; /* Transfer remaining data to beginning of buffer prior to next read */ IDOL KeyView (12.13) Page 261 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader if( pContext->nResidualBytes ) { memcpy(pContext->cInputBuf, pContext->pcBufIn, pContext->nResidualBytes); } /* Read from file, without over-writing any text from the previous buffer */ nBytes = (*pContext->pIO->kwReadFunc)(pContext->pIO, pContext->cInputBuf + pContext->nResidualBytes, BUFFERSIZE - pContext->nResidualBytes); /* Update input buffer control parameters */ pContext->unTotalBytesRead += (UINT)nBytes; pContext->pcBufIn = pContext->cInputBuf; pContext->pcBufInMax = pContext->pcBufIn + pContext->nResidualBytes + nBytes; pContext->nResidualBytes = 0; return nBytes; } If fffProcessBuffer() is unable to process the entire input source file buffer, it sets the value for pContext->nResidualBytes. When the next call to fffReadSourceFile() is made, any residual bytes are copied to the beginning of the input source file buffer, and the number of bytes to be read is reduced to make sure that this buffer does not overflow. A good way to test the code for boundary conditions is to vary the size of BUFFERSIZE and make sure that the results remain consistent. NOTE: With ReadSourceFile(), the source file can be read by calls to retrieve header or footer information. If this occurs, the value for pContext->unTotalBytesRead is incorrect. Implementation 3--Interrupting Structured Access Layer Calls Implementation 3 addresses the problem of boundary conditions and interrupting calls from the structured access layer. /**************************************************************************** * Function: fffFillBuffer() * Summary: Read fff input from stream and parse into kvtoken.h codes ****************************************************************************/ int pascal _export fffFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { double dTotalBytesProcessed, dFileSize; BOOL bRetVal = TRUE; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; pContext->cbBufOutMax = 9 * cbBufOutMax / 10; /* Process the portion of the fff file that is in the input buffer but do * not return from the fffFillBuffer() function unless the output buffer is * at least 90% full. If any of the memory allocations fail during the IDOL KeyView (12.13) Page 262 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader * execution of fffProcessBuffer(), bRetVal will be set to FALSE, resulting * in this conversion failing "gracefully". */ do { if( pContext->bBufOutFull ) { pContext->bBufOutFull = FALSE; } else { fffReadSourceFile(pContext); } bRetVal = fffProcessBuffer(pContext, pcBuf); if( pContext->bHeaderCompleted ) { *pnPercentDone = 100; pContext->bHeaderCompleted = FALSE; } else if( pContext->bFooterCompleted ) { *pnPercentDone = 100; pContext->bFooterCompleted = FALSE; } else { if( pContext->unTotalBytesProcessed >= pContext->unFileSize ) { *pnPercentDone = 100; } else if( pContext->unFileSize < FFF_MAX_ULONG ) { *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); } else { dTotalBytesProcessed = pContext->unTotalBytesProcessed; dFileSize = pContext->unFileSize; *pnPercentDone = (int)(dTotalBytesProcessed * 100 / dFileSize); } } }while( bRetVal && !pContext->bBufOutFull && *pnPercentDone < 100 ); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); } IDOL KeyView (12.13) Page 263 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Structure of Implementation 3 l The most significant change in Implementation 3 is the addition of the code that checks whether the processing of the header or footer is complete. The variables for pContext>bHeaderCompleted and pContext->bFooterCompleted are set to TRUE in fffProcessBuffer () when a header or footer is processed and the end of that portion of the document is reached. l The other piece of code added in Implementation 3 is unique to foliosr. Folio files can be 50 MB or larger. Therefore, an unsigned integer is too small to accurately calculate the percentage complete. If the file size exceeds FFF_MAX_ULONG, which is defined as (UINT)(0xFFFFFFFF / 0x64), the doubles are used for that calculation. l Prior to returning, the token output buffer is as full as possible and never overflows. The minimum number of calls is made. Development Tips l Avoid unnecessary initialization. The context variable is allocated in fpAllocateContext(). This structure must be immediately memset() to zero. This sets all BOOL values to FALSE, all pointers to NULL, and all integers to 0. Only non-zero, non-NULL and BOOLs that must be TRUE need to be initialized. This is best done in fpInitDoc(). l Know where you are in the input source file. If you are processing headers, footers, notes, or (in the case of rtfsr) tables, you must be able to reposition the file pointer as required. l Check buffer boundaries continuously. Whenever you advance through the buffer, you need to know whether there is enough of the input stream to completely process the current command. If not, you need to append the next section of the input file before continuing. l Strive for a "clean" token stream. Use filtertest with the -d command-line option to generate a token version of the document. If there are redundant tokens, the reader is producing an inefficient token stream. You can keep the token stream free from redundancies by storing the state of the document and then applying the changes only when content is encountered. Content can be text, tabs, or picture objects. The filtertest.exe is in the directory install\samples\utf8\bin, where install is the path name of the Filter installation directory. l Avoid large switch() statements whenever possible. They make both development and debugging more complicated than necessary. If there is a fixed set of commands, consider using a hash table that enables you to quickly identify a pointer to the function that handles that command. l Filtering document metadata is a separate process. Remember that fpGetSummaryInfo() is a completely separate process from the rest of your code. It creates its own context variable structure. It does not have to call fpFillBuffer(). IDOL KeyView (12.13) Page 264 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader l Use caution when processing headers, footers, and notes. If you need to process these items, the structured access layer calls fpOpenStream() and fpCloseStream(). It is critical that you save the state of your document and the file pointer position prior to returning from fpOpenStream(). Prior to returning from fpCloseStream(), you must restore the file pointer and the previous state of your document. l Test your code. The structured access layer for each SDK is unique. Test your code in Filter SDK, Export SDK, and Viewing SDK. Functions This section describes the functions used by custom readers to manage the source file and generate token streams required to convert a document. xxxsrAutoDet() This function analyzes the source document and determines whether the detected file format requires the custom reader. It is called only when the [CustomFilters] section of the formats.ini file contains an entry identifying the complete file name of the custom reader. For more information on the formats.ini file, see File Format Detection, on page 238. Syntax Bool pascal _export xxxsrAutoDet( adTPDocInfo *pTPDocInfo, KPTPIOobj *pIO) Arguments pTPDocInfo pIO A pointer to the adTPDocInfo structure provided by the structured access layer. A pointer to the I/O stream object for the document processed. Returns l TRUE if the file format matches that of the custom reader. l FALSE if the file format does not match that of the custom reader. Discussion l Typically, only the first 1 KB of the file is read into a buffer and analyzed to determine if it matches the file format of the custom reader. If a match is determined, the following four IDOL KeyView (12.13) Page 265 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader members of the adTPDocInfo structure must be assigned before returning TRUE: adClass adFormat descStr mMnmemStr Must be set to 1. A numerical value assigned to this reader in the [Formats] section of the formats.ini file. A string describing the file format. The initial part of the custom reader file name with the "sr" excluded. l If the return value is TRUE, the custom reader is used to parse the file and generate the token stream. l If the return value is FALSE, all other readers in the [CustomFilters] section of the formats.ini file are tried. If no match is found, the file detection process continues checking for the formats supported by Filter SDK. l The entry in the [Formats] section of the formats.ini file should be of the form aaa.bbb.ccc.ddd, where aaa is the value used for the adFormat parameter, bbb is the value of the file class, ccc is the value of the minor format, and ddd is the value of the major version. xxxAllocateContext() This function allocates a global memory block for a data context. A handle to this memory is returned to the structured access layer. The structured access layer passes this handle back to all reader entry points. Syntax void * pascal _export xxxAllocateContext( void *pSALContext, LPARAM (pascal *fp)(void *, UINT LPARAM), Bool *pbOpenDoc, TPVAPIServices *pVapi, DWORD dwFlags) Arguments pSALContext fp pbOpenDoc pVapi A pointer to the global data context structure of the structured access layer. A pointer to a structure of callback functions supported by the structured access layer. You must set this BOOL value to TRUE if the allocation of memory for the global data context structure is successful. A pointer to a structure providing memory management and character conversion functions. Because this functionality is proprietary to Micro Focus, IDOL KeyView (12.13) Page 266 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader dwFlags TPVAPIServices is redefined as void in kvcfsr.h. Run-time flags controlled by the structured access layer. Returns l Upon success, a pointer to the global data context structure for the custom reader. This pointer is passed back to all other custom reader entry points. l Upon error, a NULL pointer. This causes the structured access layer to shut down the process. Discussion The global context structure should be memset() to zero in this function. xxxFreeContext() This function terminates an instance of the custom reader. Syntax int pascal _export xxxFreeContext(void *pCFContext) Arguments pCFContext A pointer to the global context structure for the custom reader. Returns l Upon success, KVERR_Success. l Upon error, a non-zero error code. Discussion All memory that still remains allocated within the custom reader must be freed within this function. xxxInitDoc() This function initializes non-zero, non-null members of pContext. Syntax int pascal _export xxxInitDoc( void *pCFContext, IDOL KeyView (12.13) Page 267 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader adDocDesc long KPTPIOobj *pAutoInfo, lcbFileSize, *pIO ) Arguments pCFContext pAutoInfo lcbFileSize pIo A pointer to the global context structure for the custom reader. A pointer to an adDocDesc structure defined in kwautdef. The length of the source file in bytes. A pointer to a KPTPIOobj structure defined in kvioobj.h. Returns l Upon success, KVERR_Success. l Upon error, a non-zero error code. This causes the structured access layer to shut down the process. Discussion l For custom readers, the pAutoInfo variable can be ignored. l If the structured access layer has determined the length of the source file, that value is provided by the lcbFileSize parameter. If it is zero, the file size must be determined in this function. l The pointer pIO provides access to file management functions defined in kvioobj.h. l In this function, all non-zero, non-NULL members of the global context structure should be initialized. xxxFillBuffer() This function controls parsing of the source file and generation of tokens defined in kvtoken.h. Syntax int pascal _export xxxFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax) IDOL KeyView (12.13) Page 268 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Arguments pCFContext pcBuf pnBufOut pnPercentDone cbBufOutMax A pointer to the global context structure for the custom reader. A pointer to a memory buffer to which the tokens are written. A pointer to a variable that specifies the actual number of bytes written to the token buffer. A pointer to a variable that specifies the percentage completed of the file parsing. A pointer to a variable that specifies the maximum number of bytes written to the token buffer. Returns l Upon success, KVERR_Success. l Upon error, a non-zero error code. This causes the structured access layer to shut down the process. Discussion l Calls are made to read and parse the source file within this function. l This function is called repeatedly by the structured access layer until either the return value is FALSE or the percentage complete is 100. l The actual number of bytes written to the token buffer must not exceed the value of cbBufOutMax. xxxGetSummaryInfo() This function is required to extract document summary information. Syntax int pascal _export xxxGetSummaryInfo( void *pCFContext, KVSummaryInfoEx *pInfo, BOOL bFreeInfo) Arguments pCFContext pInfo bFreeInfo A pointer to the global context structure for the custom reader. A pointer to a KVSummaryInfoEx structure defined in kvtypes.h. A BOOL value indicating whether to free memory allocated for summary information. IDOL KeyView (12.13) Page 269 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Returns l Upon success, KVERR_Success. l Upon error, a non-zero error code. Discussion This function uses an instance of the global context structure that is different from the one used by all other reader interface functions. This function can call the same functions used by xxxFillBuffer() or can be completely independent. For more information, see Extract Metadata, on page 59. xxxOpenStream() This function is required when initiating processing of peripheral elements such as document headers, footers, footnotes, and endnotes. Syntax int pascal _export xxxOpenStream( void *pCFContext, int type, int nOrdinal) Arguments pCFContext type nOrdinal A pointer to the global context structure for the custom reader. An integer identifying a specific header, footer, footnote, or endnote. Options are defined in kvcfsr.h. An integer identifying a specific header, footer, footnote, or endnote. See the associated macros in kvtoken.h. Returns l Upon success, KVERR_Success. l Upon error, a non-zero error code. Discussion A call to this function results in a call to xxxFillBuffer(). The function xxxFillBuffer() provides a new empty output buffer and a new token stream input buffer to process the alternate stream for peripheral elements. In this alternate stream, paragraph and character style properties are likely different from the main body. Therefore, as the document is parsed, the existing values from the main IDOL KeyView (12.13) Page 270 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader body must be saved. When the processing of the alternate stream is completed and processing of the main body resumes, these values must be restored in xxxCloseStream(). xxxCloseStream() This function is required when terminating processing for document headers, footers, footnotes, and endnotes. Syntax int pascal _export xxxCloseStream( void *pCFContext, int type) Arguments pCFContext type A pointer to the global context structure for the custom reader. An integer identifying a specific header, footer, footnote, or endnote. Options are defined in kvcfsr.h. Returns l Upon success, KVERR_Success. l Upon error, a non-zero error code. Discussion Prior to exiting this function, the previously saved values in the global context structure must be restored. This ensures that processing of the main body resumes with the correct document state. xxxCharSet() This function identifies the character encoding used within the source document. Syntax KVCharSet pascal _export xxxCharSet( void *pCFContext, BOOL *bMSBLSB) IDOL KeyView (12.13) Page 271 of 284 Filter SDK Java Programming Guide Appendix H: Develop a Custom Reader Arguments pCFContext bMSBLSB A pointer to the global context structure for the custom reader. The BOOL value required for Unicode text. Set this argument to TRUE for Big Endian and FALSE for Little Endian. Returns One of the enumerated values defined in the KVCharSet structure in kvcharset.h. Discussion If the custom reader can determine the character encoding of the document, the corresponding enumerated value is returned. If the character encoding cannot be determined, KVCS_UNKNOWN is returned. IDOL KeyView (12.13) Page 272 of 284 Appendix I: Password Protected Files This section lists supported password-protected container and non-container files and describes how to open them. · Supported Password Protected File Types 273 · Open Password Protected Container Files 274 · Filter Password Protected Files 274 Supported Password Protected File Types The following table lists the password-protected file types that KeyView supports. Key to support table Symbol Description Y Format is supported. N Format is not supported. S Support for viewing subfiles. V Support for viewing content. P Password required. C Password and certificate or User ID file required. Supported password-protected file types File Type Version Filter PST (Windows) n/a N PST (non-Windows)1 n/a N ZIP n/a N 7-Zip n/a N Export N N N N Extract Y Y Y Y View S S S S Credentials P N P P 1The native PST readers, pstxsr and pstnsr, do not require credentials to open password-protected PST files that use compressible encryption. IDOL KeyView (12.13) Page 273 of 284 Filter SDK Java Programming Guide Appendix I: Password Protected Files Supported password-protected file types, continued File Type Version Filter Export RAR n/a N N SMIME in MSG, EML, n/a MBX N N Lotus Notes NSF n/a N N Adobe PDF n/a Y Y Microsoft Office 97-2003 Y Y 2007 2010 Extract Y Y Y Y Y View S N N V V Credentials P C C P P Open Password Protected Container Files This section describes how to extract password-protected container files by using the Java API. The following guidelines apply to specific file types. l Lotus Notes NSF files. If you are running a Notes client with an active user connected to a Domino server, you must specify the user's password as a credential regardless of whether the NSF files you are opening are protected. This enables KeyView to access the Notes client and the Lotus Notes API. If the Notes client is not running with an active user, KeyView does not require credentials to access the client. l PST files. To open password-protected PST files that use high encryption (Microsoft Outlook 2003 only), you must use the MAPI-based PST reader (pstsr). The native PST readers (pstxsr and pstnsr) do not support files that use high encryption and return the error message KVERR_PasswordProtected if a PST file is encrypted with high encryption. To open container files l Set the credential information to an ExtOpenDocConfig object, and pass it to the extOpenDocument method. For example: odconfig = new ExtOpenDocConfig(); odconfig.setPassword(m_password); extContextID = m_objFilter.extOpenDocument(inFile, odconfig); Filter Password Protected Files This section describes how to filter password-protected non-container files with the Java API. IDOL KeyView (12.13) Page 274 of 284 Filter SDK Java Programming Guide Appendix I: Password Protected Files To filter password-protected files l Use the setSourcePassword(java.lang.String pwd) method. For example: objFilter.setSourcePassword(pwd); where pwd is a null-terminated string of 255 characters or fewer. IDOL KeyView (12.13) Page 275 of 284 Appendix J: Microsoft Rights Management Service Protected Files This section contains information about KeyView support for Microsoft Rights Management Service (RMS). · Microsoft Azure Rights Management Service 276 · Supported Formats 277 Microsoft Azure Rights Management Service The Microsoft Rights Management Service (RMS) allows you to classify and optionally encrypt documents. This service forms the rights management part of Microsoft Azure Information Protection (AIP). For many of the files that Azure RMS can classify and encrypt, KeyView can identify whether they have been encrypted with RMS encryption. It can also extract metadata (including the RMS classification) and XrML associated with the document. For the KeyView Filter Java SDK, you can provide the credentials required to access protected files by using the Filter.configureRMS() function. This function allows the Filter and File Extraction API functions to operate on the protected data of the file. When you use Azure RMS decryption, consider the following notes: l Azure RMS decryption is licensed as an additional product. If your license does not allow for Azure RMS decryption, this function throws a FilterException that returns KVError_ ReaderUsageDenied from its getErrorCode() method. l To access the protected content, KeyView must make an HTTP request. The time required to do so means that KeyView processes protected files slower than unprotected files. l By default, KeyView uses the system proxy when it makes HTTP requests to obtain the key. You can also specify the proxy manually in the configuration file. See Configure the Proxy for RMS, on page 88. l This function is supported only on certain platforms, see RMS Decryption in the platform differences section. CAUTION: When Filter or File Extraction API functions access the protected contents of Azure RMS-protected files, KeyView may place decrypted contents into the temporary directory. If you want to manage the security of such files, you might want to change the temporary directory, by using Filter.setConfigOption() with the Filter.CFG_SETTEMPDIRECTORY constant. IDOL KeyView (12.13) Page 276 of 284 Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files RMS Credentials For KeyView to access the protected contents of Microsoft Azure Rights Management System (RMS) protected files, your end-user application must be registered on the relevant Azure domain. For more information about how to register an app, refer to the Microsoft documentation: https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-registerapp. After you register an application, you can find the client and tenant IDs in the Azure Portal, in the Overview section. You can find the client secret in the Certificates & Secrets section. CAUTION: This information is linked to the domain itself, rather than to a specific user. Providing this information allows KeyView to access the contents of all files protected by this domain. Therefore you must handle these three pieces of information securely. Supported Formats KeyView support for Azure RMS files depends on the encryption method that Azure RMS uses for each file type, and on whether the file is classified or protected. In Azure RMS, classified files have additional labels to inform users of their sensitivity, while protected files are encrypted so that only authorized users can view them. In some cases, KeyView format detection returns a different file type depending on whether the file is classified or protected. The following sections provide information about the Azure RMS support for different file types, and metadata support. Microsoft Office Files The following table describes KeyView detected formats for Microsoft Office files that Azure RMS encrypts by creating an OLE container. For these files: l KeyView can get classification metadata. l KeyView can detect whether the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag). l When you configure credentials through Filter.configureRMS(), Filter and File Extraction API functions can operate on the protected data of the file. In this case, you can filter, extract, and get summary information. In most cases, KeyView can also extract the XrML file for these files when they are protected, and identify the XrML files as KVSubFileType_XrML. IDOL KeyView (12.13) Page 277 of 284 Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files File extensions Format detected when file is classified but not protected Format detected when XrML file is protected extraction docx, dotx MS_Word_2007_Fmt MS_Office_2007_Fmt Yes docm, dotm MS_Word_Macro_2007_Fmt MS_Office_2007_Fmt Yes pptx, potx, ppsx MS_PPT_2007_Fmt MS_Office_2007_Fmt Yes pptm, potm, ppsm MS_PPT_Macro_2007_Fmt MS_Office_2007_Fmt Yes vsdx MS_Visio_2013_Fmt MS_Office_2007_Fmt Yes vsdm, vssm, MS_Visio_2013_Macro_Fmt MS_Office_2007_Fmt Yes vssx, vstm, vstx MS_Visio_2013_Stencil_Fmt MS_Visio_2013_Stencil_Macro_ Fmt MS_Visio_2013_Template_Fmt MS_Visio_2013_Template_Macro_ Fmt xlsx, xltx MS_Excel_2007_Fmt MS_Office_2007_Fmt Yes xlsm, xlsb, xltm MS_Excel_Macro_2007_Fmt MS_Excel_Binary_2007_Fmt MS_Office_2007_Fmt Yes xps MS_XPS_Fmt MS_Office_2007_Fmt Yes doc, dot MS_Word_95_Fmt MS_Word_97_Fmt MS_Word_2000_Fmt MS_Word_95_Fmt Yes MS_Word_97_Fmt MS_Word_2000_Fmt ppt, pot, pps PowerPoint_95_Fmt PowerPoint_97_Fmt PowerPoint_95_Fmt Yes PowerPoint_97_Fmt xls, xla, xlam, xlt Excel_Fmt Excel_Macro_Fmt Excel_95_Fmt Excel_97_Fmt Excel_2000_Fmt Excel_Fmt Yes Excel_Macro_Fmt Excel_95_Fmt Excel_97_Fmt Excel_2000_Fmt Implemented as pFile The following table describes the KeyView detected formats for files that Azure RMS encrypts by creating a pFile around the document. For these files: l KeyView can get classification metadata. l KeyView can detect whether the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag). IDOL KeyView (12.13) Page 278 of 284 Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files l KeyView can extract the XrML if the file is protected. l When you configure credentials through Filter.configureRMS(), Filter and File Extraction API functions can operate on the protected data of the file. In this case, you can filter, extract, and get summary information. File extensions pfile vsd vdw, vss, vst mpp, mpt pub jpg png gif Format detected when file is classified but not protected n/a MS_Visio_Fmt MS_Visio_Fmt MS_Project_4_Fmt MS_Project_41_Fmt MS_Project_98_Fmt MS_Project_2000_Fmt MS_Project_2007_Fmt MS_Publisher_98_Fmt JPEG_File_Interchange_Fmt PNG_Fmt GIF_89a_Fmt Format detected when file is protected RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt RMS_ Protected_ Fmt Notes Protected format has extension pjpg. When classified but not protected, the classification metadata is XMP. Protected format has extension ppng. Protected format has extension pgif. IDOL KeyView (12.13) Page 279 of 284 Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files File extensions tif dng dwfx psd, psb Format detected when file is classified but not protected Format detected when file is protected Notes When classified but not protected, the classification metadata is XMP. TIFF_Fmt RMS_ Protected_ Fmt Protected format has extension ptif. When classified but not protected, the classification metadata is XMP. TIFF_Fmt RMS_ Protected_ Fmt When classified but not protected, the classification metadata is XMP. MS_XPS_Fmt RMS_ Protected_ Fmt When classified but not protected, dwfx is detected and treated as XPS. PSD_Fmt RMS_ Protected_ Fmt When classified but not protected, the classification metadata is XMP. PDF Files The following table describes the KeyView detected formats for PDF documents, which Azure RMS encrypts by creating an encrypted PDF (in which each stream and metadata value is encrypted), wrapped in a container PDF. KeyView allows you to extract the encrypted PDF from the container, and then for the extracted file: IDOL KeyView (12.13) Page 280 of 284 Filter SDK Java Programming Guide Appendix J: Microsoft Rights Management Service Protected Files l KeyView can detect whether the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag). l KeyView can extract the XrML if the file is protected. l When you configure credentials through Filter.configureRMS(), Filter and File Extraction API functions can operate on the protected data of the file. In this case you can filter, extract, and get summary information for PDF formats. File extensions pdf Format detected when file is classified but Format detected when file is not protected protected PDF_Fmt PDF_Portfolio_Fmt PDF_Fmt PDF_Portfolio_Fmt Restricted Permission Messages Azure RMS encrypts email messages by creating an encrypted rpmsg attachment, which contains the original message body and attachments, attached to an unencrypted container message, which contains the message metadata. KeyView can extract the metadata and the encrypted rpmsg from the container message, and then for the extracted rpmsg: l KeyView can detect that the file is Azure RMS encrypted (the kWindowsRMSEncrypted flag). l When you configure credentials through Filter.configureRMS(), File Extraction API functions can operate on the protected data of the file. This allows you to extract the message body and attached files, but attached messages are not currently supported. NOTE: Extraction of the XrML from the encrypted rpmsg is not supported. IDOL KeyView (12.13) Page 281 of 284 Filter SDK Java Programming Guide Appendix K: OCR Supported Languages Appendix K: OCR Supported Languages KeyView OCR supports the following languages. In parentheses following each language name is the corresponding ISO 639-1 language code. Latin Alphabet Afrikaans (af) Basque (eu) Catalan (ca) Croatian (hr) Czech (cs) Danish (da) Dutch (nl) English (en) Esperanto (eo) Estonian (et) Finnish (fi) French (fr) German (de) Hungarian (hu) Icelandic (is) Italian (it) Irish (ga) Latin (la) Latvian (lv) Lithuanian (lt) Maltese (mt) Norwegian (no) Polish (pl) Portuguese (pt) Romanian (ro) Slovak (sk) Slovenian (sl) Spanish (es) Swedish (sv) Turkish (tr) Welsh (cy) Arabic Alphabet Arabic (ar) Persian (fa) Urdu (ur) Chinese Alphabet Simplified Chinese (zhs) Traditional Chinese (zht) IDOL KeyView (12.13) Page 282 of 284 Filter SDK Java Programming Guide Appendix K: OCR Supported Languages Cyrillic Alphabet Bulgarian (bg) Macedonian (mk) Russian (ru) Other Alphabets Greek (el) Hebrew (he) Japanese (ja) Korean (ko) Thai (th) Serbian (sr) Ukrainian (uk) IDOL KeyView (12.13) Page 283 of 284 Send documentation feedback If you have comments about this document, you can contact the documentation team by email. If an email client is configured on this system, click the link above and an email window opens with the following information in the subject line: Feedback on Micro Focus IDOL KeyView 12.13 Filter SDK Java Programming Guide Add your feedback to the email and click Send. If no email client is available, copy the information above to a new message in a web mail client, and send your feedback to swpdl.idoldocsfeedback@microfocus.com. We appreciate your feedback! IDOL KeyView (12.13) Page 284 of 284madbuild