Jay Vincent
Howdy!

My name's Jay and I'm a web designer & developer from High Wycombe in the UK.

(I'm also a keen skateboarder and golf enthusiast)

Blogroll & Links

  • Artlines Media
    Print/web design and development company at which I work
  • Fybeone
    My old colleague Greg Haynes, a senior level graphic designer, specialising in graffiti, illustration and animation
  • Jus Devon Moore
    A graphic design blog from my best mate, in his first year at Epsom University
  • Hannah Jean Illustration
    Displaying the illustrative talent's of friend my Hannah, in her third year at a Cambridge art school

Viewing | all articles in Regular Expressions

Whitelisting alpha-numerical characters to create friendly URL strings (slugs)

Published 3 months ago under PHP, Regular Expressions

You may have noticed the tidy URLs on this site used to link to each article. This is done by  turning each article title into a URL-friendly string, which is then used a unique key in my articles database. Many blogging systems do this as standard, but what if (like me) you prefer to build your own solutions?

What we want to do is turn this article title:
Whitelisting alpha-numerical characters to create friendly URL strings (slugs)
into a URL-friendly string:
whitelisting-alpha-numerical-characters-to-create-friendly-url-strings-slugs

And here's my function:

1: function strSEO($str) {
2:
3:    $str = preg_replace("/&#x[a-z0-9]{4};/i", "", $str);
4:    $str = html_entity_decode($str, ENT_QUOTES);
5:    $str = strtolower($str);
6:    $str = preg_replace("/[^a-z0-9\s]/i", "", $str);
7:    $str = (ereg_replace(" +", "-", trim($str));    
8:    return $str;
9:
10: };

Line 3: removes any unicode entity names.
Line 4: converts any HTML entities into their characters - for example & becomes &.
Line 5: turns string to lower-case.
Line 6: removes any character which isn't alpha-numeric or a space.
Line 7: trims the beginning and end of string, and replaces spaces with hyphens. Note that multiple consecutive spaces are only replaced with one hyphen.

Post a comment

To prevent spam, please answer this simple question:

Maintaining the query string with mod_rewrite URLs.

Published 4 months ago under Server Config, Regular Expressions

Mod_rewrite is a very handy tool for the SEO and UX conscious web developer. It allows us to map pretty and legible web addresses onto not-so-pretty ones, and allows us to construct the query string using regular expressions.

For example:

http://www.example.com/films/horror/2008
can map to:
http://www.example.com/view.php?media=films&genre=horror&year=2008

with this mod_rewrite rule in the .htaccess:

1: RewriteRule ^([a-z0-9]+)/([a-z0-9]+)/([a-z0-9]+)/?$ view.php?media=$1&genre=$2&year=$3

My problem has always been that once I'd used mod_rewrite on a URL, I couldn't add extra query string variables onto it, as they wouldn't be passed through the rule:

http://www.example.com/films/horror/2008?orderby=price
(the variable $_GET['orderby'] is not available on the view.php page)

My inelegant solution was to use $_SERVER['REQUEST_URI'] and split the URL with ? as the delimiter, thereby accessing whatever preceeded it.

The actual solution to the problem is extremely simple though. All that is needed is to add &%{QUERY_STRING} to the end of the mod_rewrite rule, and any extra query string variables are handled correctly:

1: RewriteRule ^([a-z0-9]+)/([a-z0-9]+)/([a-z0-9]+)/?$ view.php?media=$1&genre=$2&year=$3&%{QUERY_STRING}

Post a comment

To prevent spam, please answer this simple question: