public marks

PUBLIC MARKS from holyver with tags doc & "file format"

28 March 2008 22:00

Indexable File Formats

File Formats the Google Search Appliance and Google Mini Crawl and Index The following table lists word processing, spreadsheet, database, presentation, and other formats that the Google Search Appliance and Google Mini can crawl, index, and search. Please note the following: * The Google Mini and Google Search Appliance cannot crawl, index, or search any file formats that are not listed. * Text embedded in graphics is not indexed. The Google Search Appliance and Google Mini cannot index text contained in graphic file formats, such a JPEG, GIF, or TIFF. When a file in a graphic format is submitted for indexing, text embedded in the graphic is not indexed. However, the file name is indexed. If any metadata is associated with the graphic in HTML meta tags, that metadata is indexed. * Encrypted, viewable PDF documents are converted to HTML for indexing, but the cached HTML is not displayed. * PDF files created by scanning with optical character recognition (OCR) software are supported. * If you are using the Google Search Appliance, metadata can be fed from a database and then indexed. * Files in XML format cannot be crawled or indexed. * The contents of compressed file formats, such as ZIP or tar files, cannot be indexed.