Configure external site as content sources in sharepoint search


Hi,

In my previous post(Creating and configuring Search service application) I explained how to configure the search service application and its configuration. We can also search the external sites content using the SharePoint search just by creating new content source. Follow the given steps to create the new content source for external site.

To get to the Manage Content Sources page

  1. Verify that the user account that is performing this procedure is a service application administrator for the Search service application.
  2. On the Home page of the SharePoint Central Administration Web site, in the Application Management section, click Manage service applications.
  3. On the Manage Service Applications page, click Search Service Application.
  4. On the Search Administration Page, in the Crawling section, click Content Sources.

                   

After clicking on the link you will be redirected to the page where all the available content sources are displayed.

                  

To create a content source

  1. On the Manage Content Sources page, click New Content Source.
  2. On the Add Content Source page, in the Name section, in the Name box, type a name for the new content source as “External Sites”.
  3. In the Content Source Type section, select the “Web Sites”.
  4. In the Start Addresses section, in the Type start addresses below (one per line) box, type the URLs from which the crawler should begin crawling. For example: http://example.internetsite.com
  5. In the Crawl Settings section, select “Only crawl within the server of each start address”.
  6. In the Crawl Schedules section, to specify a schedule for full crawls, select a defined schedule from the Full Crawl list. A full crawl crawls all content that is specified by the content source, regardless of whether the content has changed. To define a full crawl schedule, click Create schedule.
  7. To specify a schedule for incremental crawls, select a defined schedule from the Incremental Crawl list. An incremental crawl crawls content that is specified by the content source that has changed since the last crawl. To define a schedule, click Create schedule.You can change a defined schedule by clicking Edit schedule.
  8. To prioritize this content source, in the Content Source Priority section, on the Priority list, select Normal or High.
  9. To immediately begin a full crawl, in the Start Full Crawl section, select the Start full crawl of this content source check box, and then click OK.

                      

This finishes creation of the new content source for external sites. But not all the internet facing sites allows anonymous access to its content. So there must be some place where we can configure the rules for the Urls which allows us to enter the credentials to access the content of site while crowing.

So navigate to Search Service Application > Crawl rules.

Click on the option “New Crawl Rule”

          

Path: Mention the internet site Url under the path section.

Crawl configuration: Select the “Include all items in this path” option.

Specify Authentication: Select “Specify a different content access account”. Enter the site credentials.

                    

Click Ok.

Next and last step is to crawl the content source “External Sites”. This will allow SharePoint can crawl the external site content.

Happy SharePointing 🙂

Advertisements

7 thoughts on “Configure external site as content sources in sharepoint search

  1. Hi

    I did the above setup for my external sites, I see logs that if did the Crawls.
    When I do find, it doesn’t return results. Any suggestions on it.

    Thanks
    Avinesh

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s