|
||||||
|
Home Genealogie Hemochromatose Certificaat Test Manuals Korn shell SQL commands Robots.txt Encoding SSL setup WSDL definition XSLT scrips |
PrefaceNowerdays the Internet consists of bilions of Web pages all over the world. There are no road signs that direct visitors to your site. So if you have not enough money to start a huge advertizement campain, you are stuck with search engines.Search engines are used to find specific information on the Internet. Search engines are constantly crowling (looking) over the Internet and indexing milions of pages per day. You can easily add your page to the seach engines work list. So within a couple of weeks the search engine will crowl over your page. But since the search engine is using a robot or spider, which is actualy a program that looks at your page and tries to get information about your site. The change that your site is indexed correctly is very slim. There are however ways to help and direct a search spider so that the changes Internet users can find your site in search engines will dramaticly increase. There are two methods of directing a search engine on your site.
The META tags help the spider to get the correct information about a specific web page. Robots.txtThe robots.txt file is used by search engine spiders to see what they may or may not include in there search. The robots.txt file must always be located at the root of the website. You cannot make a robots.txt for a specific part of the website.An example of a robots.txt would be my own at: http://www.schaake.nu/robots.txt The robots.txt consists of 2 commands. With the first command you can set a specific user agent for which the directive will be set. So only that specific user agent will look at the directive. The second command will restrict access to a specific directory or file on the website. Let's explane this with a sample: # Some stuff we don't want google to see User-Agent: Googlebot Disallow: /googlesecrets.html Disallow: /cgi-bin # All the other agents may also not index the cgi-bin User-Agent: * Disallow: /cgi-binWith this example, all search engines will index the whole site except the /cgi-bin. But only the googlebot will also not index the /googlesecrets.html page. Now what if we want to index the complete site, so we don't have any secrets at all. # Allow complete access User-Agent: * Disallow:Or we could disallow one agent to index our site completely. # Disallow the googlebot completely User-Agent: googlebot Disallow: /Note that not all search engines will look at the robots.txt file at all. Most of the big commercial search engines will look at your robots.txt file. But search engines of spammers (who are looking for email addresses) will not be stopped by the robots.txt file. META tagsThe META tags contain information about a specific web page. Most spiders will look at the META tags and use this information instead of trying to collect information about the page themselves.The drawback of this is that when you have incorrect or outdated information in your META tags, the spider will use this information instead of looking at the page itself. The following META tags can be used to help a spider. <META NAME="description" CONTENT="Desciption of the webpage"/>The description tag holds the title of the webpage. Keep this the same as the %lt;TITLE> tag in the page header. Some search engines will still look at the title tag instead of the description META tag! <META NAME="keywords" CONTENT="keyword1 keyword2 keyword3"/>To help a spider to collect keywords on your site, you can include the keywords META tag. This tag contains some usefull keywords Internet users can use to find your Web page. Keywords are seperated with a space. <META NAME="robot" CONTENT="index,follow"/>The robots.txt can forbit spiders to look at specific pages or complete directories. But sometimes you want some more control over the spider. The robot META tag will give you all the control you need over a spider. The first part of the tag will tell the spider if the current page may be indexed, the second part will tell the spider if it may follow hyperlinks in the current page. Possible options are:
(eg. <META NAME="robot" CONTENT="all"/>) <META NAME="refresh" CONTENT="3600"/>The refresh meta tag will tell the spider to refresh the page every number of seconds. This directive could be used for internal search engines, but I would not see a reason why a public search engine would refresh indexed it's content for your specific page. It will take weeks before a search engine will visit your site again. <META NAME="revisit-after" CONTENT="30"/>This directive makes more sense. But I'm not sure if there are search engines that look at this directive. The above example tells the search engine to revisit the site after 30 days. So if a search engine normally would plan a revisit after 14 days, it can wait another 16 days to revisit your site. This really keeps the bandwith open. <META NAME="generator" CONTENT="Microsoft Frontpage"/>This META tag tell the spider which web design tool was used to generate or design this Web page. A search engine could use this to build stastics on the usage on design tools. <META NAME="language" CONTENT="nl, en"/>This META tag defines the language used on the Web page. Normally a spider will try to detect the used language itself. But with this tag you can force a specific language. <META NAME="copyright" CONTENT="Copyright 2003 Christiaan Schaake."/> <META NAME="author" CONTENT="Christiaan Schaake"/>These 2 META tags tell the spider who wrote the page and the copyrights of this page. A search engine could include this information in the search results. Not all search engine will look at the META tags, so always use plain text for importent parts of your site. Do not make a first welcome page that only includes a big image or a shockwave animation. And make use of the title and alt tags! |
Amé Schaake Senna Schaake
|
||||
| © Christiaan Schaake | Laatste update January 16 2011 | |||||