国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Topics SEO What is robots.txt?

What is robots.txt?

May 23, 2019 am 11:01 AM

Robots.txt is the first file that search engines look at when they visit a website. It is a text file used to specify the scope of crawling of website content by search engines. When a search spider visits a site, it will first check whether robots.txt exists in the root directory of the site. If it exists, it will determine the scope of the visit based on the content in the file.

What is robots.txt?

In the process of website construction, we will have some content that we do not want to be crawled by search engines or do not want it to appear on the Internet, so what should we do? ? How do I tell search engines not to crawl my xx content? This is where robots come in handy.

Robots.txt is the first file that search engines look at when visiting a website. The Robots.txt file tells the spider what files on the server can be viewed.

When a search spider visits a site, it will first check whether robots.txt exists in the root directory of the site. If it exists, the search robot will determine the scope of access based on the contents of the file; if If the file does not exist, all search spiders will be able to access all pages on the website that are not password protected.

Syntax: The simplest robots.txt file uses two rules:

? User-Agent: The robot to which the following rules apply

? Disallow: The web page to be blocked

But we need to pay attention to a few points:

1.robots.txt must be stored in the root directory of the website,

2. Its naming Must be robots.txt, and the file name must be all lowercase.

3.Robots.txt is the first page that search engines visit the website

4.Robots.txt must specify user-agent

robots.txt Misunderstandings

Misunderstanding 1: All files on my website need to be crawled by spiders, so there is no need for me to add the robots.txt file. Anyway, if the file does not exist, all search spiders will be able to access all pages on the website that are not password protected by default.

Whenever a user attempts to access a URL that does not exist, the server will record a 404 error (file cannot be found) in the log. Whenever a search spider looks for a robots.txt file that does not exist, the server will also record a 404 error in the log, so you should add a robots.txt to your website.

Misunderstanding 2: Setting all files in the robots.txt file to be crawled by search spiders can increase the inclusion rate of the website.

Even if the program scripts, style sheets and other files in the website are included by spiders, it will not increase the website's inclusion rate and will only waste server resources. Therefore, you must set it in the robots.txt file not to allow search spiders to index these files.

Specific files that need to be excluded are detailed in the article Tips on Using Robots.txt.

Misunderstanding 3: Search spiders waste server resources when crawling web pages. All search spiders set in the robots.txt file cannot crawl all web pages.

If this is the case, the entire website will not be indexed by search engines.

robots.txt usage tips

1. Whenever a user tries to access a URL that does not exist, the server will record a 404 error (File cannot be found) in the log ). Whenever a search spider looks for a robots.txt file that doesn't exist, the server will also record a 404 error in the log, so you should add a robots.txt to your site.

2. Website administrators must keep spider programs away from certain directories on the server - to ensure server performance. For example: most website servers have programs stored in the "cgi-bin" directory, so it is a good idea to add "Disallow: /cgi-bin" to the robots.txt file to prevent all program files from being indexed by spiders. Can save server resources. Files that do not need to be crawled by spiders in general websites include: background management files, program scripts, attachments, database files, encoding files, style sheet files, template files, navigation pictures and background pictures, etc.

The following is the robots.txt file in VeryCMS:

User-agent: *

Disallow: /admin/ Background management file

Disallow: / require/ Program file

Disallow: /attachment/ Attachment

Disallow: /images/ Picture

Disallow: /data/ Database file

Disallow: / template/ template file

Disallow: /css/ style sheet file

Disallow: /lang/ encoding file

Disallow: /script/ script file

3. If your website has dynamic web pages, and you create static copies of these dynamic web pages to make them easier for search spiders to crawl. Then you need to set up settings in the robots.txt file to prevent dynamic web pages from being indexed by spiders to ensure that these web pages will not be regarded as containing duplicate content.

4. The robots.txt file can also directly include links to the sitemap file. Like this:

Sitemap: http://www.***.com/sitemap.xml

The search engine companies that currently support this include Google, Yahoo, Ask and MSN. Chinese search engine companies are obviously not in this circle. The advantage of this is that the webmaster does not need to go to the webmaster tools or similar webmaster sections of each search engine to submit his own sitemap file. The search engine spider will crawl the robots.txt file and read the content in it. sitemap path, and then crawl the linked web pages.

5. Proper use of the robots.txt file can also avoid errors during access. For example, you can’t let searchers go directly to the shopping cart page. Since there is no reason for the shopping cart to be included, you can set it in the robots.txt file to prevent searchers from entering the shopping cart page directly

The above is the detailed content of What is robots.txt?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to protect deep work time and retain focus as an SEO How to protect deep work time and retain focus as an SEO Jun 19, 2025 am 10:07 AM

For any SEO professional, staying focused and productive can be a challenge.With constant algorithm updates, changing trends and a barrage of emails and notifications, it can feel like you’re always playing catch-up.That’s where deep work sessions co

Google AI Overviews, clicks and traffic impact: Unraveling the mystery Google AI Overviews, clicks and traffic impact: Unraveling the mystery Jun 22, 2025 am 09:42 AM

Google started including AI Overviews (AIO) in U.S. search results on May 14. While Google has made vague references to the fact that links within AIO may experience higher click-through rates (CTRs), it remains unclear when directly questioned about

WordPress 6.5 gains lastmod date for sitemaps files WordPress 6.5 gains lastmod date for sitemaps files Jun 23, 2025 am 09:42 AM

WordPress version 6.5 now includes support for the lastmod element in sitemap files, which can help search engines identify new or updated content. This enhancement may improve crawl efficiency and reduce server load.Lastmod. The lastmod element can

Nearly 60% of Google searches end without a click in 2024 Nearly 60% of Google searches end without a click in 2024 Jun 14, 2025 am 10:45 AM

A majority of Google searches – 58.5% in the U.S. and 59.7% in the EU – result in zero clicks. A zero-click search occurs when users end their session or input a new query without clicking on any results.This data comes from a new zero-click search s

11 of the best free tools every SEO should know about 11 of the best free tools every SEO should know about Jun 14, 2025 am 10:09 AM

Working in SEO requires various tools for research and analysis.We need to understand website performance, market trends, user behavior, competitor activity and the effort needed to achieve our goals.While premium tools offer many useful features, se

Want to speak at SMX Next? Now's the time to submit a pitch! Want to speak at SMX Next? Now's the time to submit a pitch! Jun 14, 2025 am 10:05 AM

SMX Next is back online November 13-14, zeroing in on what's currently effective in search and how to gear up for 2025.Search marketing is constantly evolving. In 2024 alone, we’ve witnessed major shifts such as significant Google algorithm updates,

Rethinking your keyword strategy: Why optimizing for search intent matters Rethinking your keyword strategy: Why optimizing for search intent matters Jun 20, 2025 am 10:20 AM

Search engines continue to evolve, but SEO strategies have failed to keep up. For years, we have relied on keyword research to choose specific searches to target. However, keyword research often prioritizes the wrong goals. Executed well, keyw

Reminder: New Google Search reputation abuse policy starts soon Reminder: New Google Search reputation abuse policy starts soon Jun 28, 2025 am 10:22 AM

Google’s new Search spam policy surrounding reputation abuse – a tactic often called “parasite SEO” by SEO professionals – will go into effect “after May 5,” as confirmed by Google. May 5 falls on this Sunday.This wasn’t unexpected. Back in March, Go

See all articles