Replacing node/123 links with more meaningful URLs

Ever wanted to know why Drupal exposes content via links like "node/123" instead of "content/some-more-meaningful-page-title-reference"? Or are you unhappy with URLs like "node/123"? Then read on.

The way Drupal manages content requests out-of-the-box is very lean and efficient, but it is not very meaningful for human beings, content recommendation engines or search engines. Fortunately there is an easy solution, requiring only 2 extra modules: Pathauto and Token.

Drupal and URLs

The way Drupal manages links to any content is by parsing the URL specified in the HTTP request issued by a visitor's web browser. When you deploy a Drupal release without any contributed modules, then content will invariably be referenced with URLs ending in something like  "node/123":

  • The "node" part informs Drupal that the visitor wants to see a content node.
  • The "123" is a number that refers to the unique identifier of one particular content node (nid in Drupal jargon) in your Drupal deployment (i.c., 123).

Similarly, users are referenced by URLs ending in "user/123", and the Drupal administration interface has an URL containing "/admin/" in it.

This approach offers a lean and effective way to unambiguously reference content on a Drupal enabled site. For Drupal it requires little effort to fetch the requested content in this scheme. However it was not designed with a human being (or a search engine) in mind. Indeed, it's hard to guess which contet hides behind "node/123" unless you have the page opened somewhere or you jotted down the mapping on some paper napkin a zillion times.

Automated creation of URL aliases

The first item we want to address, is giving more meaningful URLs to our content. The Pathauto module supports automated generation of URL aliases for any content managed by a Drupal site, be it content nodes, users, or taxonomy.

Pathauto is a very powerful module. It relies on the Token module to offer additional patterns (tokens) that can be used for constructing a meaningful URL alias. Pathauto also provides facilities for replacing unsafe characters in URLs with safe "ASCII96" representations; for this feature you must rename the example character translation file (see below). Finally, Pathauto allows assigning new URL aliases for existing content (bulk operations).

Installation of this module is straightforward: download the module from the Drupal repository, unpack the archive, then rename the file "i18n-ascii.example.txt" to "i18n-ascii.txt", finally upload the "pathauto" directory to your Drupal's "sites/all/modules" directory.

After enabling the module via admin/build/modules, it is time to configure how automated URL aliases should be configured. Pathauto's administrative interface is fully integrated in the "URL aliases" section at "admin/build/path". Click on the "Automated aliases settings" tab (or directly go to "admin/build/path/pathauto") to access the Pathauto settings.

At first you may be a bit overwhelmed. This is normal {#emotions_dlg.grin} but fortunately most of the settings shoud not be touched. Pfew, you're probably already feeling better now, I bet. So let's quickly explain what to do to get Pathauto running:

  • General settings: this will expose a first set of settings governing the way Pathauto should do its job. Important settings:
    • Character case - set to lower case to transform the case from the URL alias to lowercase (recommended)
    • Update action - if you want to assign new URL aliases for existing content (bulk update), you must not choose "Do nothing". This is a major cause of confusion. Later you can change this setting to some other value.
    • Transliterate prior to creating alias - enable
    • Reduce strings to letters and numbers from ASCII-96 - enable. If you can't enable this setting, you probably forgot to rename "i18n-ascii.example.txt" to "i18n-ascii.txt"
  • Puctuation settings: defines how punctuation and other special noncharacter symbols should be processed. Leave unchanged.
  • Blog path settings: defines how user blog aliases will be set up. If you have blogs on your site, you want to enable this.
    • The default token substitution is "blogs/[user-raw]" and will replace links to user blogs with e.g. "blogs/olivier-biot" for my blog. A complete list of replacement tokens will be displayed if you expand the "Replacement patterns" field group.
    • You can also have your replacement URL apply to RSS/Atom feeds. In this case, don't leave the "Internal feed alias text" value empty. Recommended value is "feed".
    • If existing blogs have to be realiased in bulk, don't forget to tick "Bulk generate aliases for blogs that are not aliased".
  • Node path settings: defines how content node aliases will be set up. This is probably the main reason why you wanted this module in the first place. You can define how aliases are created on two levels: per content type, and for all content types. Pathauto will first check whether a default replacement pattern was defined for all content types. Then it will check if the specific content type you want an alias for has a replacement pattern defined. If a content type specific pattern exists, then it will be applied. Else, the default pattern will be applied if it exists. This means that the only way to skip URL alias creation for a given content type, is to leave the default pattern empty and to specify the required patterns for the content types you want an URL alias policy for. Let's look at the configurations in mor detail now:
    • Default path pattern - specifies the default replacement pattern that will apply if you do not specify a per content type pattern. Again, an exhaustive list of replacement tokens is displayed when clicking on the "replacement patterns" link. On my site I use the following default replacement pattern: "content/[type]/[yyyy]-[mm]-[dd]/[title-raw]"
    • Pattern for all Blog entry paths - specifies how URLs of individual blog posts should be aliased. On my site I use "blogs/[author-name-raw]/[yyyy]-[mm]-[dd]/[title-raw]"
    • Pattern for all XXX entry paths - specifies how URLs of individual XXX items should be aliased. On my site I left all of the remaining content type specific alias rules empty so the default rule will apply to them.
    • If existing content has to be realiased in bulk, don't forget to tick "Bulk generate aliases for nodes that are not aliased".
  • Taxonomy term path settings: defines how taxonomy terms and vocabularies can be aliased automatically. See "node path settings" above for the details (very similar).
  • User path settings: defines how individual users can be aliased automatically. On my site I left this empty so users are not aliased. This is only relevant for users that register on your Drupal site as they are the only ones seeing these links.

Remarks

  1. The Pathauto module works at 2 levels:

    1. Creation of aliases for content that already existed prior to installing this module
    2. Creation of aliases for new content

    From the moment youo define URL alias replacement patterns, every new content you make on your Drupal site will have an URL alias created according to your settings.

    When you want to process existing content, make sure to tick the correct behavior in "General settings", "update action". This setting does not affect new content!

  2. Not all replacement tokens are available for the 4 alias categories (blog, node, taxonomy term, user). A list of allowable tokens is given in each category.
  3. When defining replacement patterns for the automatic URL alias generation process, one should try to avoid cases where the same URL alias would be generated for different content. For example, using only "[title-raw]" as a pattern in a forum might be prone to duplication. For a Wiki this would not be a problem as it makes little sense for Wikis to have duplicate page titles. There is however no silver bullet solution for defining a suitable replacement pattern. If unsure, use the content node identifier (nid in Drupal jargon) somewhere at the start of your URL pattern when defining content node aliases.
  4. Pathauto also provides a bulk alias deletion function. Read the documentation that comes with the module (use with caution).

Now that we have our URL aliases generated on the fly, we have 2 URLs pointing to the same content: the internal Drupal alias ("node/123") and the URL alias generated by Pathauto. Why this is not a good idea, and how to get rid of this? Wait till my follow-up blog post.

Shahnawaz's picture

Thanks

Thanks for this post. I am new to drupal and it's really very helpful in my case.

john blue's picture

Adding new aliases to existing aliases nodes?

Thanks for post, good guidance.
Current setup: I use Drupal 5.x and have had Pathauto (using 5.x-2.2 currently) installed for a year or two. 
Goal: I now need to create new set of aliases for all the nodes to help support Google News requirements (Display a three-digit number. http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=68323 ).
Question: I believe Pathauto can accomplish this(?). However, when I read "Bulk generate aliases for nodes that are not aliased" am I to presume that any node that already has an alias will not get any new aliases generated for it?
For example, I have a page, node/1171, whose title is "Special Conference". This page has two URLs; http://mysitename.com/node/1171 and http://mysitename.com/special-conference (generated by existing Pathauto) . I have about 500 pages like this. 
Now I want to create a new set of aliases for these existing pages, using the pattern 
content/[type]-000[nid]/[title-raw] 
Using the example above,  http://mysitename.com/node/1171 will have a new URL alias created that would be http://mysitename.com/content/page-0001171/special-conference, AND the existing URL alias http://mysitename.com/special-conference will still exist.
So when I check "Bulk generate aliases for nodes that are not aliased" in Pathauto's Node path settings area, provide the patterns I need and click "save configuration", will the pages that already have the original URL aliases get a new Google News friendly URL alias? I suspect not... My test server checks of these steps also imply this but I am not sure if I missed something.
Thoughts and guidance appreciated.
John BlueThanks for this post, good guidance.
Current setup: I use Drupal 5.x and have had Pathauto (using 5.x-2.2 currently) installed for a year or two. 
Goal: I now need to create new set of aliases for all the nodes to help support Google News requirements (Display a three-digit number. http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=68323 ).
Question: I believe Pathauto can accomplish this(?). However, when I read "Bulk generate aliases for nodes that are not aliased" am I to presume that any node that already has an alias will not get any new aliases generated for it?For example, I have a page, node/1171, whose title is "Special Conference". This page has two URLs; http://mysitename.com/node/1171 and http://mysitename.com/special-conference (generated by existing Pathauto) . I have about 500 pages like this. 
Now I want to create a new set of aliases for these existing pages, using the pattern content/[type]-000[nid]/[title-raw] Using the example above,  http://mysitename.com/node/1171 will have a new URL alias created that would be http://mysitename.com/content/page-0001171/special-conference, AND the existing URL alias http://mysitename.com/special-conference will still exist.
So when I check "Bulk generate aliases for nodes that are not aliased" in Pathauto's Node path settings area, provide the patterns I need and click "save configuration", will the pages that already have the original URL aliases get a new Google News friendly URL alias? I suspect not... My test server checks of these steps also imply this but I am not sure if I missed something.
Thoughts and guidance appreciated.
John Blue
fyi, I have these settings setup before saving configurations
 
General settings 
* checked: Verbose
* checked: Character case Change to lower case
* checked: Update action=Create a new alias. Leave the existing alias functioning.
* checked: Transliterate prior to creating alias
* checked: Reduce strings to letters and numbers from ASCII-96
* Maximum number of objects to alias in a bulk update = 2000
Punctuation settings
* changed: Hyphen set to no action (do not replace) 
Node path settings
* checked: Bulk generate aliases for nodes that are not aliased

vishal's picture

I am unable to get it done.

I want my conent to have the category name like soap and then the title. I manged to get the title right by using [node:title] but I can't seem to get the taxanomy part right How sud I write that ?
Category/[type] ?
Ur help will be much appreciated
 
cheers,
vishal

sahasra's picture

clarification

Everything is fine.Now can we give multiple url aliases for the same webform?

BrianLewisDesign's picture

[node:menu-link:parents:join-path]

 
[node:menu-link:parent:url:path]/[node:title] -- this repeats the top level parent multiple times, for nth children in the menu. makes a mess. doesn't get the full heirarcy.
[node:menu-link:parent:parent:parent:parent:url:path]/[node:title] -- this gets the top parent only, with no repeats, down to the 4th child in the menu.
[node:menu-link:parents:join-path]/[node:title] -- perfect. this gets the whole parent menu hierarchy into the URL. that's what i was after.
[node:menu-link:parent:url:path]/[node:title] -- this repeats the top level parent multiple times, for nth children in the menu. makes a mess. doesn't get the full heirarcy.[node:menu-link:parent:parent:parent:parent:url:path]/[node:title] -- this gets the top parent only, with no repeats, down to the 4th child in the menu.[node:menu-link:parents:join-path]/[node:title] -- perfect. this gets the whole parent menu hierarchy into the URL. that's what i was after.

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.