Thursday, 28 November 2013

SharePoint Crawls - Full, Incremental, Continuous

Full Crawl

Full Crawl of Content source will re index all the content from beginning

Important Points to Consider:
  • If new managed property has been introduced, we need to run Full Crawl of content source     
  • If new crawl rules are created/updated/deleted, Full crawl of content source is required
  • If incremental crawl has been failed
  • If software update or service pack has been installed on the servers
  • Expensive in terms of performance issues

Incremental Crawl

Incremental Crawl of Content source will only process those items which are changed since the last crawl happened.

Important Points to Consider:
  • Most preferred after the full crawl has been done.
  • Does not hamper the performance as it will crawl only modified documents not the entire content source.
  • The incremental crawl will retry items and postpone crawling the item if the error persists.

A limitation with the Full Crawl and Incremental Crawl is we cannot schedule both to execute parallel. For example if the Full Crawl is already running then the Incremental Crawl cannot be triggered until the Full Crawl completes, if you try to Stop Full Crawl then also it is mandatory to finish at least once successful Full crawl before triggering any Incremental Crawls.

So Microsoft has come up with the concept of Continuous Crawl

Continuous Crawl

With Continuous Crawl you can maintain the content index as fresh as possible.               
    More than one continuous crawl can run in parallel
    one deep change will not result in degraded freshness on all following changes

The impact of a "Continuous Crawl" is the same as an incremental crawl.

At the parallel execution of crawls, the "Continuous Crawl" within the parameters defined in the "Crawler Impact Rule" which controls the maximum number of requests that can be executed by the server (default 8).

Enable Continuous Crawl using PowerShell

#Get Search Service Application(SSA)
$ssa = Get-SPEnterpriseSearchServiceApplication

#Get the Content Source for which you want to enable continuous crawl
$cs = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa -Identity "Local SharePoint sites"

#Set the EnableContinuousCrawls property to true
Set-SPEnterpriseSearchCrawlContentSource -Identity $cs -EnableContinuousCrawls $True

#Set the interval - You can skip this part by default SharePoint will run  continuous crawl for every 15 minutes. Here I am using 30 minutes
$interval = "30"

$ssa.SetProperty("ContinuousCrawlInterval", [int]$interval)


To Disable Continuous Crawl using Powershell

We can use the same PowerShell script mentioned above, with the following change in the line where we need to set ‘EnableContinuousCrawls’ property to ‘False’

Set-SPEnterpriseSearchCrawlContentSource -Identity $cs -EnableContinuousCrawls $True

No comments:

Post a Comment