Guides and tutorials

Hundreds of tutorials and step by step guides carefully written by our support team.

How to create a robots.txt file?

How to create a robots.txt file?

What is it? The robots.txt file is a plain text file that must meet the robots exclusion standard.

You can create the file with the Windows notepad and save it with the name robots.txt

This file consists of one or more rules, each of which blocks or allows a specific crawler access to a specific file path on a website.

The robots.txt file is used to manage crawler traffic to your site.

It is used to prevent the requests that your website receives from overloading it, with the robots.txt file well configured, you can avoid when you receive several visits from these indexers at the same time, that the speed of your website or even the Cloud itself is seen adversely affected.

What do we block? The crawler, also known as a tracker, spider, robot or bot. It is a program that analyzes the documents on the website. Search engines use very powerful crawlers that navigate and analyze websites creating a database with the information collected.

What elements make up the robots.txt? When generating the robots.txt file you have to take into account the specific commands and rules.

Commands User agent: It is the command used to specify the search engine robots/spiders that we allow to crawl our website.

The syntax of this command is: User-agent: (robot name)

 (In each rule there must be at least one Disallow or Allow entry)

Disallow: Indicate a directory or a page of the root domain that you do not want the user-agent to crawl.

Allow: Indicates the directories or pages of the root domain that the user ‑ agent specified in the group should crawl. It is used to override the Disallow directive and allow a specific subdirectory or page of a locked directory to be crawle.

One option is to put an asterisk, this means that you allow all search engines to crawl the web.

User-agent: (*)
Disallow

The following command is to instruct search engines not to crawl, access or index a specific part of the web, such as the wp-admin folder.

Disallow: /wp-admin/
Allow

With the following command you indicate the opposite, you mark the search engines what they can crawl. In this example it only allows one file from a specific folder.

Allow: /wp-admin/admin-ajax.php

Other elements to consider.

When adding elements to block, you must place the forward slash (/), at the beginning and end.
The code can be simplified.
    *. The asterisk is used to block a sequence of characters.
    $. The dollar symbol is used when you want to block URL's with a specific ending.

Examples of commands used in robots.txt.

Exclude all robots from the server:

User-agent: *
Disallow: /

Allow all robots to have access to scan everything:

User-agent: *
Disallow:

Exclude only one bot, in this case Badbot:

User-agent: BadBot
Disallow: /

Allow only one bot, in this case Google:

User-agent: Google
Disallow:
User-agent: *
Disallow: /

Exclude a directory for all bots:

User-agent: *
Disallow: /nombre-directorio/

Exclude a specific page:

User-agent: *
Disallow: /url-pagina.html

Block images from the web:

User-agent: Googlebot-Image
Disallow: /

Lock an image for one bot only:

User-agent: Googlebot-Image
Disallow: /imagen/bloqueada.jpeg

Exclude a specific file type:

User-agent: Googlebot
Dissallow: /*.jpeg$

Exclude URL's with a specific ending:

User-agent: *
Disallow: //pdf$

These are examples of use, use the one that suits your needs or create one that suits you.

Once the robots.txt file has been created, upload it through FTP dentro del directorio /tudominio/datos/web/

More than 2000 m² of own facilities and Data Centers in Spain
Your privacy is important for us
We use our own cookies for the proper functioning of the site. In addition, third-party cookies are used for analytical purposes only. This information is not associated with any person so that personal identifying data is not stored, but is only information that is collected to identify the session, with the aim of facilitating the analysis of the website. You can change your preferences at any time by entering this website again. For more information about our cookie policy you can visit our Cookies. You can press the "Accept and close" button to give us your consent or you can access more detailed information and manage cookies.
More than 2000 m² of own facilities and Data Centers in Spain
Your privacy is important for us
We use our own cookies for the proper functioning of the site. In addition, third-party cookies are used for analytical purposes only. This information is not associated with any person so that personal identifying data is not stored, but is only information that is collected to identify the session, with the aim of facilitating the analysis of the website. You can change your preferences at any time by entering this website again. For more information about our cookie policy you can visit our Cookies. You can press the "Accept and close" button to give us your consent or you can access more detailed information and manage cookies.