TOTECS Forums

Should we block all crawlers/bots from accessing our website hosted on the TOTECS platform?
Author
Thread
Should we block all crawlers/bots from accessing our website hosted on the TOTECS platform?
Author Rowan Drew
April 17 2018 @ 01:59 PM
Thread If your business is hosting your website on the TOTECS platform you will be concerned with the amount of traffic that is hitting your website, since the platform calculates your monthly hosting fee based on the total amount of server requests being made.

If your website is made publicly available then any person can access it across the world, as well as any automated software, also known as "crawlers" or "bots". These crawlers and bots freely navigate the web looking for content, and if they find out that your website exists then they will try to access every page and find all content that you are publicly sharing. There are several ways that these crawlers and bots can find your website, the easiest is if they find a link to your website from a different website that they have already accessed (also known as crawling), or are otherwise directly notified by you that your website exists. Crawlers and bots will periodically come back to your website to see if any new content has been added, updated or removed since it last crawled your web pages.

There are many different crawlers and bots run by a myriad of companies and people. The most well known is Google, Bing (Microsoft), and Yahoo search engines. But there are many other bots run by others such as Russian and Chinese search engines, marketing companies, Facebook, your competitors, and even your customers wishing to extract details from your website.

With all these crawlers and bots chewing up precious server traffic when trawling your website, the natural thing to do is to just block them all. Unfortunately it's not as simple as that. Whilst TOTECS can block a great proportion of known bots if asked to do so, some of these bots are actually helpful to your business. For instance Google's crawler will go through and find out all the web pages and content you have publicly provided. Then it will index/store this information in its search engine so that when a person uses Google's search to find specific information that match what your website has, there's a good chance they will find and be directed to your website. In this example Google is actually a sales lead generator. If you block Google's crawler from accessing your content, you could stop Google referring people to find out about your website, as well as its analytics software if embedded in your website. The same applies for the Bing and Yahoo bots.

Within the Administration Centre of websites hosted on the TOTECS platform, under the Statistics menu, in the Project Website Statistics interface you can see a list of the crawlers and bots known to TOTECS that are hitting your hosted websites. Using this list you can start to work out which bots are hitting your websites, how often and how much.

Before telling the TOTECS team to block all crawlers and bots from accessing your website, its worth going through the list of bots and working out which ones are helping your business, generating leads and foot traffic, and which ones are wasting your website's time. Generally it's recommended to keep allowing the major search engine crawlers accessing your website such as Google and Bing, since these search engines are used ubiquitously by western people in web browsers, and in the Windows, Android and iOS operating systems. As well as Facebook and other social media crawlers since they are used to find about content that people have linked your page to within their feeds. For all other search engines you need to determine if they are adding traffic to your website, ditto for all other crawlers.

Once informed them advise the TOTECS team which crawlers and bots you would like to block. To answer the original question you typically only want to block all known crawlers if you solely want to take responsibly for the leads and people who can find and access your website.

Note how I said earlier that your customers or competitors may also use or hire bots to trawl your website. This leads to another important point that the TOTECS platform can only detect crawlers and bots that identify themselves as such. The majority of legitimate companies will run crawlers and bots that identify themselves however other actors may choose to not to do this. So while we try our best here to block crawlers and bots where we can, we can't block all bots when they are not easily recognised, or change their name (known as a user-agent). For this reason it's generally advisable to only make content available on your public website that you are OK to have picked up by anyone. Though we can also place blocks on traffic from certain countries, there is always a way for crawlers, bots and people to access content using VPN's to change their country of origin.


TOTECS Software Development Manager

Post a Comment
Message:
By clicking on the Post button you accept the rules and guidelines of using this forum.