Guides and tutorials

Hundreds of tutorials and step by step guides carefully written by our support team.

How to create a robots.txt file?

How to create a robots.txt file?

**What is it? The robots.txt file is a plain text file that must comply with the robots exclusion standard.

You can create the file with Windows Notepad and save it under the name robots.txt.

This file consists of one or more rules and each rule blocks or allows a particular crawler to access a particular file path on a website.

The robots.txt file is used to manage crawler traffic to your site.

It is used to prevent the requests your website receives from overloading it, with the robots.txt file properly configured, you can prevent the speed of your website or even the Cloud itself from being negatively affected when you receive several visits from these indexers at the same time.

**What do we block? The crawler, also known as crawler spider, robot or bot. It is a program that analyzes website documents. Search engines use very powerful crawlers that browse and analyze websites creating a database with the information collected.

What elements make up the robots.txt?? When generating the robots.txt file, you must take into account specific commands and rules.

**Commands User agent: This is the command used to specify the robots/spiders of the search engines that we allow to crawl our website.

The syntax of this command is: User-agent: (name of the robot)

(In each rule, there must be at least one entry Disallow or Allow)

Disallow: Indicates a directory or a page of the root domain that you do not want the user-agent to crawl.

Allow: Specifies the directories or pages in the root domain that the user-agent specified in the group should crawl. It is used to override the Disallow directive and allow a specific subdirectory or page of a blocked directory to be crawled.

One option is to put an asterisk, this means that you allow all search engines to crawl the site.

User-agent: (*)
Disallow

The following command is to tell search engines not to crawl, access or index a specific part of the website, such as the wp-admin folder.

Disallow: /wp-admin/
Allow

With the following command you indicate the opposite, you mark to the search engines what they can crawl. In this example it only allows a file from a specific folder.

Allow: /wp-admin/admin-ajax.php

Other elements to take into account.

When adding elements for blocking, you must place the slash (/) at the beginning and end.
The code can be simplified.
    *. The asterisk is used to lock a sequence of characters.
    $. The dollar sign is used when you want to block URL's with a specific ending.

**Examples of commands used in robots.txt **.

Exclude all robots from the server:

User-agent: *
Disallow: /

Allow all robots to have access to scan everything:

User-agent: *
Disallow:

Exclude only one bot, in this case Badbot:

User-agent: BadBot
Disallow: /

Allow only one bot, in this case Google:

User-agent: Google
Disallow:
User-agent: *
Disallow: /

Exclude a directory for all bots:

User-agent: *
Disallow: /nombre-directorio/

Exclude a specific page:

User-agent: *
Disallow: /url-pagina.html

Block images from the web:

User-agent: Googlebot-Image
Disallow: /

Lock an image for one bot only:

User-agent: Googlebot-Image
Disallow: /imagen/bloqueada.jpeg

Exclude a specific file type:

User-agent: Googlebot
Dissallow: /*.jpeg$

Exclude URL's with a specific ending:

User-agent: *
Disallow: //pdf$

These are examples of use, use the one that suits your needs or create your own.

Once you have created the robots.txt file, upload it via FTP into the /tudomain/data/web/ directory.

More than 2000 m² of own facilities and Data Centers in Spain
Your privacy is important for us
We use our own cookies for the proper functioning of the site. In addition, third-party cookies are used for analytical purposes only. This information is not associated with any person so that personal identifying data is not stored, but is only information that is collected to identify the session, with the aim of facilitating the analysis of the website. You can change your preferences at any time by entering this website again. For more information about our cookie policy you can visit our Cookies. You can press the "Accept and close" button to give us your consent or you can access more detailed information and manage cookies.
More than 2000 m² of own facilities and Data Centers in Spain
Your privacy is important for us
We use our own cookies for the proper functioning of the site. In addition, third-party cookies are used for analytical purposes only. This information is not associated with any person so that personal identifying data is not stored, but is only information that is collected to identify the session, with the aim of facilitating the analysis of the website. You can change your preferences at any time by entering this website again. For more information about our cookie policy you can visit our Cookies. You can press the "Accept and close" button to give us your consent or you can access more detailed information and manage cookies.