Home > Tools > CMS > Drupal > Drupal Hacks > Clearing Drupal's Page Cache after Adding Comments

Clearing Drupal's Page Cache after Adding Comments

Using Drupal's cache can greatly reduce server load, especially if you have lots of non-logged in visitors and many database queries have to executed before a page is displayed. When caching is turned on the HTML output of a page is stored in the cache_page table, so one query is enough to retrieve the HTML output for display instead of hundreds of queries on pages with many blocks, links, etc.

For logged in users pages are not retrieved from the cache but are newly generated each time they request a page. If logged in users post a comment for example they see it immediately after they hit the submit button, wheres non-logged users have to wait until the minimum cache lifetime of the page has expired.

If your comment and access settings allow non-logged in users to post comments and you have some kind of spam protection activated instead of moderating these comments, this behavior is irritating for visitors, because no message is displayed, that says something like “your comment will be visible in XX minutes.”

The good news is you can change that behavior quite easily by making use of Drupal's smart hook system. Instead of displaying a message like the one above I will clear the cache entry of the page that was commented on, so the comment is immediately visible for non-logged in users as well. Note only entries for that particular node in the cache_page table will be deleted per comment.

To do so I add some code to my custom.module implementing the hook_comment(). I write such a custom modules for almost every Drupal site I implement to achieve functionality that is specific to that site.

To also make sure that comments unpulished or deleted by the admin are not visible any longer for non-logged in visitors, the cache entry for that page will also be deleted in these cases.

Check out the following code snippet for Drupal 5.x to see how this kind of functionality can be realized:

<?php
function custom_comment($a1, $op) {
  switch (
$op) {
    case
'insert':
    case
'update':
     
$nid = $a1['nid'];
      break;
    case
'unpublish':
    case
'delete':
     
$nid = $a1->nid;
      break;
  }
  if (
$nid) {
   
// retrieve the absolute url for the node
   
$url = url('node/'. $nid, NULL, NULL, TRUE);
   
// delete cache entries for that url
   
cache_clear_all($url, 'cache_page');
  }
}
?>

That's it. If you want to learn more about Drupal's cache system I strongly recommend Jeff Eaton's article A beginner's guide to caching data.

This unfortunately doesn't work if multiple URLs can point at the same page.

Examples:

  • clean urls could have /?q=node/1 and /node/1 both pointing at the same page
  • i18n could have /node/1 and /en/node/1 both pointing at the same page

Would be nice if there were a robust way to enumerate those possible duplicate URLs so that they could all get cleaned at the same time.

This means that each of them have another cache entry, so it would be better not to point multiple URLs to the same content. If you have clean URLs enabled users won't see /?q=node/1, to avoid that robots spider these additional but useless URLs you can exlude them in the robots.txt file.

"clean urls could have /?" I'm not really sure, but I think the answer is yes
Guys, what is HTML?
Thanks, this should be merged into comments.module, now I can use comments with agressive caching :).
I updated the script for Drupal 6.2:

function keffcache_comment($a1, $op) {
switch ($op) {
case 'insert':
$nid = $a1['nid'];
break;
case 'update':
$nid = $a1['nid'];
break;
case 'unpublish':
case 'delete':
$nid = $a1->nid;
break;
}
if ($nid) {
// retrieve the absolute url for the node
$url = url('node/'. $nid, array('absolute' => TRUE));
// delete cache entries for that url
cache_clear_all($url, 'cache_page');
}
}

PS: a filter to change newlines to BRs in comments would be nice :) I had to manually insert Br tag to every line of preceding text
Thanks a lot Keff for you Drupal 6 snippet!

Found this most useful article when I had a related problem: I have forms that anon users can fill in, with default values stored in the session. Because the form page (as displayed from an initial GET request) was being cached, the default values were those from the cache, and not the current session.

Fixed this, thanks to a post in the Protected Node module issue list (http://drupal.org/node/233979), by setting $GLOBALS['conf']['cache'] to false for the affected form pages. To make things neat, I did this in my module's form_alter() hook, with a simple check on the $form_id to only do this for the forms with the problem.

This very neatly stops the pages with these forms on being cached, even from a GET request, while all the other pages get cached for anon users as normal.
This means that each of them have another cache entry, so it would be better not to point multiple URLs to the same content. If you have clean URLs enabled users won't see /?q=node/1, to avoid that robots spider these additional but useless URLs you can exlude them in the robots.txt file.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br> <img> <h2> <h3> <h4> <h5>

More information about formatting options