Blogging Stuff in Plain English: .htaccess

htaccess, the command that controls how your site can be accessed. But what does those weird characters means? and how do they actually do the job?

htaccess file is one of those thing that you won’t touch, until you are in a deep, deep trouble. Also,  when you want to make some major change to your site.

Rest assured, I did both – troubleshoot some big problems associated in my website and several other sites that I manage, along making some major  changes to the sites. There is even an entire site dedicated to this one file. Chances are, you’re here after or while facing some troubles associated with htaccess.

For starters, let’s answer the basic questions:

What is htaccess and why does it exist?

So htaccess is a configuration file that define the site access control. It will work for web servers that runs Apache Web Server software, among a few others.

It is short for hypertext access. And it has a dot in the beginning, because that’s how you tell a Unix-based environment to hide a file. Most web servers runs in Linux.

(The more you know)

Consider this most common situation: you are on a shared hosting, meaning that there’s a bunch of other websites running on the same web server. Each website wants its own site-access related configurations, such as:

  • controlling URL redirection, e.g. when you want people that visit a certain obsolete part of your site to a new location
  • who can access the site, i.e. password-protected sites
  • shorten certain URLs that are too long,
  • etc.

So, to make it work for each single site, each site has its own htaccess file(s).

What if a site doesn’t have htaccess file? Not a problem, but the site will follow whatever the default configuration of a web server is.

Usually you will find .htaccess hidden in the root directory of your hosting file manager, which can be accessed via cPanel.

What does the file say?

What draws me in the most is the peculiar way the file defines the access configuration. I have a bit of familiarity with HTML language, but at first it was kind of hard for me to decipher what the file is saying.

And in this section, we’ll try to figure out what our htaccess file actually says. Here is one common example from a wordpress-based site:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /wp/
RewriteRule ^index\.php$ – [L]
RewriteCond %{HTTP_HOST} ^(www.)?yoursite.com$
RewriteRule ^(/)?$ /wp [L]

# add a trailing slash to /wp-admin
RewriteRule ^wp-admin$ wp-admin/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ – [L]
RewriteRule ^(wp-(content|admin|includes).*) $1 [L]
RewriteRule ^(.*\.php)$ wp/$1 [L]
RewriteRule . index.php [L]
</IfModule>

The first line, <ifModule mod_rewrite.c> basically says this: if the named module “mod_rewrite.c” loaded at the web server, then do the next line of commands that follows. At least up to the closing </ifModule> is not shown in the code snapshot above.

Now, mod_rewrite.c is a file name in the webserver that will rewrite the web server configuration into whatever that follows the ifModule. Yes, the file name ends with .c extension, as Apache are originally developed in C programming language. And that is why I was taken aback at first by the language, because it’s not the HTML I’m familiar with. By the way, XML is also often used in Apache web server.

Cool, so let’s dive in to the next lines which have four distinct commands, or shall we say, directives: RewriteEngine, RewriteBase, RewriteRule, and RewriteCond. What are they?

RewriteEngine On is pretty straightforward – it enables the runtime rewriting engine.

RewriteBase is to specify the location or base URL in which the directives that follows will affect. You set this when the site content directory is not in the document root. In the example above, the affected location  is everything after /wp/.

RewriteRule is to actually set the rule. In the line 3 of the example, RewriteRule ^index\.php$ – [L], is trying to say: when a user requests index.php, stops processing rules. Or in other words, index.php cannot be accessed from a URL.

This rule is to prevent requests for index.php from being rewritten, to avoid unnecessary system check. This rule is specifically written by WordPress (the company) for all the installations of newer WordPress versions, to fix an issue with Apache’s mod_rewrite. Jeff Starr from Digging into WordPress explained this specific rule in detail.



Now, why we have those ^ \ $ symbols?

These symbols are what we call ‘terms’, which has a specific function. In the above example, ^ marks the beginning of the line and $ marks the ending of the line. \ or backslash is an escaping character, to remove the function of the special character after it. In this example, dot (.) actually is a term as as well, and by putting backslash before it, the dot is no longer a functional term, but just a character like the letters.

And finally, [L] stands for Last rule, and tells the rule to stop. The dash before it is just for spacing purposes, to make the line more readable to human eyes. If you want to have a reference about the full list of terms, go check this thread at Webmaster World.

Obviously, there can be more than one rules, and RewriteCond is there to specify a rule condition. Take a look on the fourth and fifth line, for instance.

RewriteCond %{HTTP_HOST} ^(www.)?yoursite.com$
RewriteRule ^(/)?$ /wp [L]

The 4th line tells the condition for the 5th line to work, in which when a user access yoursite.com to “redirect” to yoursite.com/wp.

The %{HTTP_HOST} variable is to match the rule with the hostname to cover a variety of yoursite.com. Such as www.yoursite.com, http://yoursite.com, http://yoursite.com/, etc.

Let’s decipher the terms, shall we?

%{} is a way to call a variable, which in this case is HTTP_HOST. As usual, ^ and $ are to start and end the rule, respectively. The (www.)? bracket is to match the sequence “www.” zero or one time. Meaning it is optional to have www. before yoursite.com.

On the 5th line, the first part that start and ends with ^ and $ is to match with a slash zero or one time, meaning it’s optional to have slash. And then the next part is what the rule is about, which it to append /wp everytime a user access your top-level domain (yoursite.com, for instance).

Game Time!

Now that we covered the basics of how to read a htaccess file, try to decipher the next section.  Get some googling to find out what R=301 is.

# add a trailing slash to /wp-admin
RewriteRule ^wp-admin$ wp-admin/ [R=301,L]

Got them?


This part says to rewrite wp-admin to wp-admin/ (with slash at the end), and that this is a 301 permanent redirect. There are other types of redirect (300, 302, 308, etc) that Search Engine People has explained.

Now we’re on to the last few lines of the htaccess. I’ll explain these in a rapid fire round, since the explanations are pretty repetitive.

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d

Those lines say that if somebody request a file that exists in the server (via REQUEST_FILENAME variable),  whether it is a directory (via -d attribute) or a regular file (via -f attribute), to…

RewriteRule ^ – [L] – stop doing anything

And for the rest (non-directory and non-filename) requests that go through, do these:

RewriteRule ^(wp-(content|admin|includes).*) $1 [L] – if the request is asking either wp-content, wp-admin, or wp-includes, write as is. 
RewriteRule ^(.*\.php)$ wp/$1 [L] – if asking for whatever.php, rewrite it as wp/whatever.php. The $1 is the Regex backreference ability, which will call for the value inside the first parentheses right before.
RewriteRule . index.php [L] – if none of the other rules apply, this rule rewrite everything else to index.php.

Quite fun, huh? here’s more to learn about htaccess. Feel free to comment about what you want me to cover next, or if you have more knowledge about htaccess that we all can learn from.

Resources (other than the ones linked in the article)

Official Apache Documentation

Basic ht access for WordPress

 

Leave a Reply

Your email address will not be published. Required fields are marked *