Drupal 6 XML Sitemap for Nodes
Posted: Fri 05/02/2008 by ramiroAfter upgrading to Drupal 6 I opted for a quick and dirty XML sitemap approach. Before I was using the XML Sitemap module which is currently available for Drupal 6 as a development snapshot or directly from CVS. The module offers settings for priority and change frequency. Moreover the module allows for adding taxonomy term and user URLs to the sitemap.
I only wanted nodes and the front page to appear in the sitemap's XML output without priority or change frequency information. Having the path and pathauto modules enabled, which ensure that every node gets a meaningful and search engine friendly URL, a simple database query joining two tables is enough to get the necessary data for all published nodes.
Code Snippets
To make the sitemap reachable via a URL a menu item of the type MENU_CALLBACK goes into the menu hook of a module named custom. The menu hook changed in Drupal 6 and so did the whole menu system which was completely rewritten by chx. To learn more about it, read the menu module documentation.
<?php
function custom_menu() {
$items = array();
$items['sitemap'] = array(
'title' => 'XML Sitemap',
'access arguments' => array('access content'),
'type' => MENU_CALLBACK,
'page callback' => 'custom_sitemap'
);
return $items;
}
?>When the URL /sitemap is requested the function custom_sitemap() is called. The sql query joins the node and url_alias tables to retrieve all modified dates and URL aliases of published nodes which are stored in an associative array called urls. The URL aliases 403 and 404, that are used for custom error pages, are omitted from the array.
What follows is putting together the XML output string in a here document and a foreach loop and printing it out.
<?php
function custom_sitemap() {
$base = ($_SERVER['HTTPS'] ? 'https://' : 'http://') . $_SERVER['SERVER_NAME'] . base_path();
$urls = array();
$result = db_query("SELECT ua.dst, n.changed FROM {node} n INNER JOIN {url_alias} ua ON ua.src = CONCAT( 'node/', n.nid ) WHERE n.status =1 ORDER BY n.changed DESC");
while($r = db_fetch_object($result)) {
if ($r->dst != 404 && $r->dst != 403) {
$urls[$base . $r->dst] = $r->changed;
}
}
header('Content-Type: text/xml');
$xml =<<<EOF
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>$base</loc><changefreq>daily</changefreq></url>
EOF;
foreach ($urls as $url => $t) {
$xml .= '<url>';
$xml .= '<loc>' . $url . '</loc>';
$xml .= '<lastmod>' . date("Y-m-d", $t) . '</lastmod>';
$xml .= '</url>';
}
$xml .= '</urlset>';
print $xml;exit();
}
?>


When the next DUG meeting takes place I am not in Berlin. See you at Linuxtag. Greets Ramiro
Post new comment