Home > Tools > CMS > Drupal > Drupal Hacks > Drupal 6 XML Sitemap for Nodes

Drupal 6 XML Sitemap for Nodes

After upgrading to Drupal 6 I opted for a quick and dirty XML sitemap approach. Before I was using the XML Sitemap module which is currently available for Drupal 6 as a development snapshot or directly from CVS. The module offers settings for priority and change frequency. Moreover the module allows for adding taxonomy term and user URLs to the sitemap.

I only wanted nodes and the front page to appear in the sitemap's XML output without priority or change frequency information. Having the path and pathauto modules enabled, which ensure that every node gets a meaningful and search engine friendly URL, a simple database query joining two tables is enough to get the necessary data for all published nodes.

Code Snippets

To make the sitemap reachable via a URL a menu item of the type MENU_CALLBACK goes into the menu hook of a module named custom. The menu hook changed in Drupal 6 and so did the whole menu system which was completely rewritten by chx. To learn more about it, read the menu module documentation.

<?php
function custom_menu() {
 
$items = array();
 
$items['sitemap'] = array(
   
'title' => 'XML Sitemap',
   
'access arguments' => array('access content'),
   
'type' => MENU_CALLBACK,
   
'page callback' => 'custom_sitemap'
 
);
  return
$items;
}
?>

When the URL /sitemap is requested the function custom_sitemap() is called. The sql query joins the node and url_alias tables to retrieve all modified dates and URL aliases of published nodes which are stored in an associative array called urls. The URL aliases 403 and 404, that are used for custom error pages, are omitted from the array.

What follows is putting together the XML output string in a here document and a foreach loop and printing it out.

<?php
function custom_sitemap() {
 
$base = ($_SERVER['HTTPS'] ? 'https://' : 'http://') . $_SERVER['SERVER_NAME'] . base_path();
 
$urls = array();
 
$result = db_query("SELECT ua.dst, n.changed FROM {node} n INNER JOIN {url_alias} ua ON ua.src = CONCAT( 'node/', n.nid ) WHERE n.status =1 ORDER BY n.changed DESC");
  while(
$r = db_fetch_object($result)) {
    if (
$r->dst != 404 && $r->dst != 403) {
     
$urls[$base . $r->dst] = $r->changed;
    }
  }
 
 
header('Content-Type: text/xml');
 
$xml =<<<EOF
<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>$base</loc><changefreq>daily</changefreq></url>
EOF;
  foreach ($urls as $url => $t) {
    $xml .= '<url>';
    $xml .= '<loc>' . $url . '</loc>';
    $xml .= '<lastmod>' . date("Y-m-d", $t) . '</lastmod>';
    $xml .=  '</url>';
  }
  $xml .= '</urlset>';
  print $xml;exit();
}
?>
Exactly what I was looking for. And then I see that it's you. Nice. Was it also you who was mentioned in the sprydev podcast? I'm looking forward to the next DUB meeting! I'll see you there. Bob

When the next DUG meeting takes place I am not in Berlin. See you at Linuxtag. Greets Ramiro

Thank you very much for that information! I am planning to update my Drupal 5.X portals to 6.2 in the next days and was looking for such a sitemap solution.
Interesting. Do i put the first code snippet in a .module file? And where do I leave the second code? Thanks for helping me out!
Thanks i hope that helps me out.
Hey guys, This code works like a charm, thanks a lot Ramiro. I allowed myself to make a drupa module out of it so everyone doesn't have to copy-paste the code. I hope you don't mind (if you do, please tell me and I'll remove it) You'll find the module here : http://dinaiz.blogspot.com/2008/06/easily-add-sitemap-to-drupal-cms-6x.html

Thats fine with me Dinaiz. The reason why I did not put this code into a module of its own, is because there is an XML sitemap module for Drupal, that gives you more control than my code snippet.

Anyone, including myself, who is comfortable with such a simple approach can use this code or now your module, which is good.

Hey Ramiro ! OK for the already existing "XML sitemap" module. I think yours is better, because the results it produces make more sense, especially when used together with pathauto and aliases. By the way, my module is actually your code packaged into a module, so it doesn't make any difference to use your code or my module. So the bottom line is, thanks for this piece of code which helped me and apprently, many other people too :-) P.S. : The module page is now on : http://dinaiz-two-dot-zero.blogspot.com/2008/07/easily-add-sitemap-to-dr...

My url's look like this:
http://www.mydomain.ch/en/products

So I had to add the language path-prefix to the sitemaps url's.

I applied the following changes:

replaced:
$result = db_query("SELECT ua.dst, n.changed FROM {node} n INNER JOIN {url_alias} ua ON ua.src = CONCAT( 'node/', n.nid ) WHERE n.status =1 ORDER BY n.changed DESC");

$urls[$base . $r->dst] = $r->changed;

with the following: $result = db_query("SELECT ua.dst, n.changed, n.language FROM {node} n INNER JOIN {url_alias} ua ON ua.src = CONCAT( 'node/', n.nid ) WHERE n.status =1 ORDER BY n.changed DESC");

$urls[$base . $r->language . '/' . $r->dst] = $r->changed;

Could you possibly explain where those code blocks need to be placed? I believe Taco was wondering as well...

In the example this code goes into a module called custom.module which should be placed in the directory where you install contributed modules, e.g. /sites/all/modules. You also need an .info file which would be called custom.info. See the documentation on .info files.

It seems to suggest that the priority settings in an XML site map can make a big difference to how Google indexes/ ranks webpages and this can impact fast.
I get "This XML file does not appear to have any style information associated with it...". Any Ideas?

When do you get that message?

while accessing the sitemap at xyz.com/sitemap.xml ?

Do you see the XML output? Which browser are you using?

Big fan, thanks for this breakdown!

As an aside, this code could be better optimized by putting the while loop in place of the foreach statement. As it is written, you are parsing through the returned result set twice: once to create an array, and then looping through that same array to generate the XML. On large pages, that could be resource costly.

Good point, the additional foreach loop should be avoided.

It doesn't appear to display more than about 40 nodes - is there not a way to show more?
er make that 22 nodes...

I use it on this site where it contains entries for all nodes with url aliases. The number is significantly higher than 22 or 40.

Are you sure you understand what this code does?

After I struggeled a month with the xmlsitemap modul, I finally found this article. It's absolutely nice explained, even for a newbie like myself. I implemented the module custome without any problems, and it works nicely. Thanks a lot...
I'm also seeing this issue: "This XML file does not appear to have any style information associated with it..." Does it have anything to do with Postgres as opposed to MySQL?
what's code like: #mydiv { position:absolute; top: 50%; left: 50%; width:30em; height:18em; margin-top: -9em; /*set to a negative number 1/2 of your height*/ margin-left: -15em; /*set to a negative number 1/2 of your width*/ border: 1px solid #ccc; background-color: #f3f3f3; }
I did not get it to work with Drupal 6.10 until I changed the item code from 'sitemap.xml' to 'sitemap'. Now it works perfectly!
Could anyone tell me how to use this code? (exactly)? What files do we have to create, where do we put them, where does this code go to work?

Check out this post http://dinaiz-two-dot-zero.blogspot.com/2008/07/easily-add-sitemap-to-dr... on Dinaiz' blog. He put this code into a module. I have not tried it, but I guess it is what you are looking for.

Hi Ramiro It works thanks. Great job. But how can I delete a url that I don't want to appear - I have updated the robot.txt but it still comes up on the site map? It is for a private members area. Also is there a way of weighting the pages? E.g. So I can say that the homepage is the most important, the 2nd page is next and the others don't matter. Any advice appreciated.

Hi Jonny checkout the XML Sitemap module for Drupal http://drupal.org/project/xmlsitemap which offers configuration options for priority and content types.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br> <img> <h2> <h3> <h4> <h5>

More information about formatting options