Below you can find meta tags (often unknown) suitable for inclusion in your HTML document. These tags allow better indexing by robot-driven search engines, such as AltaVista and Infoseek . Some HTML editors will generate some of these tags automatically. Title (Strongly Recommended) Your documents title will appear in user's hotlists, the banner of most
browsers, and robot-generated lists. It should be a concise, one-line summary of what the page is about. Bear in mind that users may not reach your document through your homepage, but directly using a search engine or link at another site, so the title should ideally be self-sufficient. Keywords (Recommended) Space-separated
list of key words for indexing your document. Some robots look at keywords in context, so it is best to preserve word order and case, e.g. free advertising Italy Tuscany rather than Italy free Tuscany advertising Example: <META NAME="keywords" CONTENT="free advertising Italy Tuscany"> Description (Recommended) The description is presented to the user along with the document's title as the result of a search. Many robots use the first few lines of text as a description if the Description tag is not present. For documents using frames, it is possible that there is no such text present. Example: <META NAME="description" CONTENT="Tuscany, Italy Free Advertising"> Owner (Recommended) Legacy value. Some browsers (e.g. Lynx) use this to mail the document author. Example: <META NAME="owner" CONTENT="Azpire Incorporated"> Expiry Date (Optional) The date after which the listing may be deleted. Default is never. This is also used by Netscape and proxy servers to delete documents from the cache. If you know
your page will go stale, this is probably a good idea. Netscape Navigator honours the META tag; other agents and proxies may require the HTTP header. Netscape 3 will cache a document with an "Expires: 0" tag, but will issue a GET with If-Modified-Since (regardless of option settings), and thus retrieve the updated copy. Example: <META NAME="expires" CONTENT="15days">
Object Type (Recommended) Allows a document to be searched for in a particular category. Recommended where document is indexed elsewhere (ISSN, Library of Congress, etc.) Below is a list of possible categories known and accepted by Search Engines: Document Homepage World Realworld FAQ RFC Magazine Mall Dictionary
Archive SearchEngine Hypercatalog Keybank Manual Index Book Database Journal Catalog Linecard HOWTO Revisit Interval (Optional) Controls how often your document is re-visited by a search engine robot). Example: <META NAME="revisit after" CONTENT="15days">
Rating (Optional) A complement to (not a substitute for) the PICS system; modelled after the familiar Motion Picture ratings, to indicate violence, language and adult content of a document. Example: <META
HTTP-EQUIV="PICS-Label" CONTENT='(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true comment "RSACi North America Server" for "http://www.azpire.com" on "2000.01.13T14:43-0800" r (n 0 s 0 v 0 l 0))'> Language (Optional) Dialect (Optional) Charset (Optional) Charsets
may be specified by the server; for instance: Content-type: text/html; charset=iso-8859-5. Netscape 2.0 works properly with this method; unfortunately, most other browsers break. Netscape 3 will use a META tag to automatically switch fonts (X11 Netscape, at least), and provided the server does not parse HTTP-EQUIV META tags into real HTTP headers, other browsers will ignore it. Thus this method becomes recommended for non-ISO-8859-1 (Western European) character sets, as it will cause Netscape
to select the correct font for each page. The default HTML charset is ISO-8859-1 (Western European 8-bit). Example: <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> Robots (Recommended)
See the workshop report at W3 for the full text. <META NAME="ROBOTS" CONTENT="ALL | NONE | NOINDEX | NOFOLLOW"> default = empty = "ALL" "NONE" = "NOINDEX, NOFOLLOW" The filler is a comma separated list of terms:
ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW. Discussion: This tag is meant to provide users who cannot control the robots.txt file at their sites. It provides a last chance to keep their content out of search services. It was decided not to add syntax to allow robot specific permissions within the meta-tag. INDEX means that robots are welcome to include this page in search services.
FOLLOW means that robots are welcome to follow links from this page to find other pages. So a value of "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. A value of "NOFOLLOW" allows the page to be indexed, but no links from the page are explored (this may be useful if the page is a
free entry point into pay-per-view content, for example. A value of "NONE" tells the robot to ignore the page. Example: <META NAME="robots" CONTENT="all"> |