Configure external site as content sources in sharepoint search


Hi,

In my previous post(Creating and configuring Search service application) I explained how to configure the search service application and its configuration. We can also search the external sites content using the SharePoint search just by creating new content source. Follow the given steps to create the new content source for external site.

To get to the Manage Content Sources page

  1. Verify that the user account that is performing this procedure is a service application administrator for the Search service application.
  2. On the Home page of the SharePoint Central Administration Web site, in the Application Management section, click Manage service applications.
  3. On the Manage Service Applications page, click Search Service Application.
  4. On the Search Administration Page, in the Crawling section, click Content Sources.

                   

After clicking on the link you will be redirected to the page where all the available content sources are displayed.

                  

To create a content source

  1. On the Manage Content Sources page, click New Content Source.
  2. On the Add Content Source page, in the Name section, in the Name box, type a name for the new content source as “External Sites”.
  3. In the Content Source Type section, select the “Web Sites”.
  4. In the Start Addresses section, in the Type start addresses below (one per line) box, type the URLs from which the crawler should begin crawling. For example: http://example.internetsite.com
  5. In the Crawl Settings section, select “Only crawl within the server of each start address”.
  6. In the Crawl Schedules section, to specify a schedule for full crawls, select a defined schedule from the Full Crawl list. A full crawl crawls all content that is specified by the content source, regardless of whether the content has changed. To define a full crawl schedule, click Create schedule.
  7. To specify a schedule for incremental crawls, select a defined schedule from the Incremental Crawl list. An incremental crawl crawls content that is specified by the content source that has changed since the last crawl. To define a schedule, click Create schedule.You can change a defined schedule by clicking Edit schedule.
  8. To prioritize this content source, in the Content Source Priority section, on the Priority list, select Normal or High.
  9. To immediately begin a full crawl, in the Start Full Crawl section, select the Start full crawl of this content source check box, and then click OK.

                      

This finishes creation of the new content source for external sites. But not all the internet facing sites allows anonymous access to its content. So there must be some place where we can configure the rules for the Urls which allows us to enter the credentials to access the content of site while crowing.

So navigate to Search Service Application > Crawl rules.

Click on the option “New Crawl Rule”

          

Path: Mention the internet site Url under the path section.

Crawl configuration: Select the “Include all items in this path” option.

Specify Authentication: Select “Specify a different content access account”. Enter the site credentials.

                    

Click Ok.

Next and last step is to crawl the content source “External Sites”. This will allow SharePoint can crawl the external site content.

Happy SharePointing 🙂

16 thoughts on “Configure external site as content sources in sharepoint search

  1. Hi

    I did the above setup for my external sites, I see logs that if did the Crawls.
    When I do find, it doesn’t return results. Any suggestions on it.

    Thanks
    Avinesh

    Like

  2. I’m doing this for multiple sites, any idea how I can create a refiner that I can use to select by site.
    i.e. crawling news.***, calendar.***, and www.*** Only www is a sharepoint site.

    Like

  3. Hi, what would happen to dynamic content of the site? e.g. there is a page in external site, which fetch data on-load from SQL.

    Like

    1. Hello Vikas,

      Thanks for referring my post. In case you have dynamic content of the site, it must be part of the next full/incremental crawl. It does not matters if you are fetching data from SQL on load or from any service. SharePoint will only have reference of all the pages exist on the external website.

      Hope that answer your question.

      Regards
      Mohit

      Like

  4. Asking questions are actually good thing if you are not
    understanding anything entirely, however this piece of writing presents nice understanding even.

    Like

  5. Excellent post. I was checking constantly this blog and I’m impressed!
    Very useful info specially the last part 🙂 I care for such info a lot.
    I was looking for this particular information for a long time.
    Thank you and best of luck.

    Liked by 1 person

  6. I have doubt, why we need to mentioned “specify diffrent content source accont” so what account details we need to update is that external site account details or which one

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.