As SharePoint administrators, you already know that there is some logs generated while crawling your content to show you if everything is ok or if there any issue (content access, file type unknown or not supported…).
Unfortunately, there is NO way to access these logs in a pleasant way to work on it (export).
Recently, I had to work for a customer which had a huge volume of data to index which required to detect many file type unknown by SharePoint as well as any crawl error (logs were showing about 10 000 errors/warnings) and browsing the logs from the Central Administration was not really simple neither usable.
The following apply for both SharePoint search engine and Fast for SharePoint search engine.
With SharePoint 2010, almost everything is now stored on databases, including logs.
Open your SQL Management Studio console and create a new query
Then enter the following query
SELECT u.crawlid, u.accessurl, u.errorid, e.errormsg
FROM [<your search crawl store database>].[dbo].[MSSCrawlURL] u
join [<your search crawl store database>].[dbo].[MSSCrawlErrorList] e
Then you will be able to export the result as CSV file and work on the logs using Excel
The u.errorid value has to be adapt accordingly to what you want to get. Error code can be obtained from the [<your search crawl store database>].[dbo].[MSSCrawlErrorList] table
Sample error code and explanation
|1||266755||0||0||The content did not change.|
|2||1||1||0||Deleted by the gatherer|
|3||-2147218169||1||1||Item not crawled due to one of the following reasons: Preventive crawl rule; Specified content source hops/depth exceeded; URL has query string parameter; Required protocol handler not found; Preventive robots directive.|
|4||-2147218168||1||0||The specified address was excluded from the index. The site hops or page depth restrictions may have to be modified to include this address.|
|5||265520||1||0||The filtering process could not load the item. This is possibly caused by an unrecognized item format or item corruption.|
|6||265582||1||0||This item comprises multiple parts and/or may have attachments. Not all of these parts were indexed. They may either be invalid or deliberately skipped (e.g. images). The remote server may also have been unresponsive while indexing these parts.|
|7||265616||1||0||The content for this address was excluded by the crawler because this item was marked with a no-index meta-tag. To index this item, remove the meta-tag and recrawl.|
|8||265626||1||0||This document is a child of another document. It will not be cataloged separately.|
|9||-2147217977||1||0||Removed from the search index by Admin. This item will be excluded from future crawls.|
|10||265674||1||0||The URL was permanently moved.|
|11||265699||1||0||The FAST Search backend reported warnings when processing the item.|
|12||-2147217948||1||0||This URL is part of a host header SharePoint deployment and the search application is not configured to crawl individual host header sites. This will be crawled as a part of the host header Web application if configured as a start address.|