Hotlink Protection

Updated about 5 yrs, 8 mths ago (April 16, 2012). Know a better answer? Let me know!

Protect your files from hot linking

“Hotlinking” (sometimes called “Link Hijacking”), the action of linking directly to a file or resource on a site, rather than the page the site owner intended, is a common problem. The most common example is when someone places an image on his or her site, but rather than host the image themselves, they link directly to an image on your site. When people visit their site, they are actually viewing your image and using your bandwidth to view it – without even knowing it comes from your site.

Referrers

There are two main methods of preventing “hotlinking”, and both have flaws. The most common involves checking for referrers. Most browsers send a “referrer” – the address of the page that referred them to your page, that is, the address of the page they were viewing directly before they went to your page, which is generally the page that they followed a link from to get to your page – or, in the case of images embedded into a page, the address of the page they are embedded into. You can perform a simple check, if the referrer is from your site, then you allow the user to access files such as images, and if not, you block them. The main problem with this is that you generally need to allow blank referrers, as some people don’t send referrers at all, and if people type an address directly into a browser, it won’t have a referrer. Some more paranoid people also send fake referrers, in the belief that they’re some type of privacy risk.

Sessions

The other more complex way, which I will detail here, involves the use of sessions – and is almost infallible, although it still has its flaws.

The logic behind this idea is simple – when a user first visits your site, you start a session. A session is a simple method of maintaining information about a specific user across a period of time. Unfortunately, HTTP itself has no way of doing this – every HTTP request is generally not associated with any other HTTP request, so a server-side method is required. Here we use PHP’s inbuilt session handling.

Having started a session when the user first visited your site, you simply check, every time that user attempts to access a specific type of file, for example an image, whether or not that user has a valid session. An exception is made for certain robots, such as Google’s Googlebot. For these we simply check that their user-agent matches a list of known robot’s user-agents.

mod_rewrite

We use Apache’s .htaccess and mod_rewrite to rewrite all requests for specific file types, in this case jpg, png, gif, swf and mp3. This means that when a user requests, for example, hardanswers.net/picture.jpg, this is internally rewritten to hardanswers.net/hotlink.php?file=picture.jpg. The .htaccess code used to do this is below:

# Prevent hotlinking - remember to manually update mime_types in hotlink.php
RewriteRule ^(hotlink.php.*) - [L]
RewriteRule ^(.+\.(jpg|png|gif|swf|mp3))$ hotlink.php?file=$1 [L]

PHP

Having rewritten all requests for specific resources to go through your hotlink.php, and assuming you are actually using PHP, you will need to create the hotlink.php file. There’s one small issue with this method. By default, PHP will send headers to ensure that dynamically generated content is not cached. This is not what you want when you’re actually sending non-dynamic content, such as images, so we have to modify the headers sent. The commented code, from hotlink.php, is as follows:

// get the hotlink.php?file=blah
$file = $_GET['file'];

// ensure the $file is a valid file
// one may wish to add further checking, such as check the directory, here
if(!is_file($file) and
	($file_ext == ('jpg' or 'gif' or 'png' or 'swf' or 'mp3')))
{
	// if the file is not a valid file
	// send 404 Not Found header and exit
	header("HTTP/1.1 404 Not Found");
	exit();
}

// get the last modified time of the file
$mtime = filemtime($file);

// format the last modified time into an HTTP compliant format
// example Mon, 22 Dec 2003 14:16:16 GMT
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';

// send an Etag
header('ETag: "'.md5($mtime.$file).'"');

// check if last modified date is the same as that sent
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
	if ($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
	{
		header('HTTP/1.1 304 Not Modified');
		exit();
	}
}
// check if Etag is the same as that sent
if (isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
	if (str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH']))
		== md5($mtime.$file))
	{
		header("HTTP/1.1 304 Not Modified");
		exit();
	}
}

// send headers to ensure resource is cached
session_cache_limiter('public');
// 30 * 24 * 60 minutes or one month
session_cache_expire('43200');

// start the session so we have access to session variables
session_start();

// must send this header here to overwrite header produced by session
header('Last-Modified: '.$gmt_mtime);

// check if the session is set or the user is in list of allowed user_agents
// such as Googlebot
if(isset($_SESSION['user']) || 
	(preg_match('/GoogleBot|AltaVista|ia_archive|Inktomi|Lycos|Jeeves|Slurp|Scooter|W3C_Validator/i',
	$_SERVER['HTTP_USER_AGENT']) > 0))
{
	// get the file extension of the file
	$array = explode(".", $file);
	$file_ext = $array[1];
	
	// set an appropriate mime_type based on the file type
	switch($file_ext)
	{
		case 'jpg':
			$mime_type = 'image/jpeg';
			break;
		case 'gif':
			$mime_type = 'image/gif';
			break;
		case 'png':
			$mime_type = 'image/png';
			break;
		case 'swf':
			$mime_type = 'application/x-shockwave-flash';
			break;
		case 'mp3':
			$mime_type = 'audio/mpeg';
			break;
	}

	// send the mime_type
	header('Content-Type: '.$mime_type);

	// send the file itself
	readfile($file);
}
// session is not set so user is invalid
else
{
	// write a log file of the invalid access attempt
	if (substr($_SERVER['REQUEST_URI'],-5) != '/none')
	{
		// open file
		// saved format will be "path/name.2004.04.hotlink.cvs"
		$fp=fopen('path/name.'.gmdate('y.m').'.hotlink.csv','a');

		// write data
		fwrite($fp,
			$_SERVER['REQUEST_URI'].','
			.date("Y-m-d\TH:i:s").','
			.$_SERVER['REMOTE_ADDR'].','
			.$_SERVER['HTTP_REFERER'].','
			.$_SERVER['HTTP_USER_AGENT']."\n"
		);

		// close file
		fclose($fp);
	}

	// overwrite default headers with non-caching headers
	header('ETag: "'.md5(time().$file).'"');
	header('Last-Modified: '.gmdate('D, d M Y H:i:s', time()).' GMT');
	header('Cache-Control: must-revalidate');
	header('Expires: '.gmdate('D, d M Y H:i:s', time()).' GMT');

	// return 304 Forbidden header
	header('HTTP/1.1 403 Forbidden');
}

The other thing you must do, is ensure that a session is created when a user first visits your site. This is surprisingly simple if you’ve used good design decisions when creating your site, and you can easily add a piece of code to every page – just add the following code to every page on your site:

// use this line if you want to use only cookies
// see below for more information
// ini_set('session.use_only_cookies',1);
session_start();
if(!$_SESSION['user'])
{
	$_SESSION['user'] = time();
}

Cookies

There’s, unfortunately, one major flaw with all of this. By default, PHP will attempt to store session information, generally in the form of a large pseudo-random number, in a cookie. Cookies have received some bad press, so some people now consider them a privacy risk and block them. This is not true – a session cookie is not a privacy risk, it is a simple method used to ensure that session data is propagated across different pages and you should never block them. However, because people do block cookies, PHP has a secondary method it uses should cookies fail. PHP adds “?PHPSESSID=bigPseudoRandomNumber” to every link in your page. This works well – when a user clicks on the link, PHP can gather the session information from it. Unfortunately though, this has one major drawback. Search engines such as Google don’t support cookies, so when they visit your site they will see all links with a large random number appended. Next time they visit your site, they will see all the links with a different random number appended. This will confuse them and they will probably decide your links have all changed, and your site will drop into the nether regions of search-engine-land, never to be seen again. If this is likely to be a problem with your site, you should use the following code to force PHP to use cookies only. Be aware though, that when using cookies only, anyone who refuses, or doesn’t support, cookies will not be able to view the protected resources on your site.

To force PHP to use cookie only session handling, either add the line “session.use_only_cookies=1” to a php.ini file in the main root directory, or place “ini_set('session.use_only_cookies',1);” directly before calling “session_start()”.

More Information

For more information on:

 

Updated about 5 yrs, 8 mths ago (April 16, 2012). Know a better answer? Let me know!

Related categories [coloured].

User submitted comments:

hguhf, about 9 yrs, 4 mths ago
Tuesday August 5, 2008 9:05 PM

i have onarcade games website http://games.jeddahbikers.com
and i will try to hotprotect the SWF files because i'm running out of Bandwidth

Thanks for the .htaccess code

Ned, about 9 yrs, 3 mths ago
Sunday September 7, 2008 1:43 PM

No worries, I hope you find it useful

Jai, about 7 yrs, 2 mths ago
Sunday September 19, 2010 2:21 PM

This is working fine but , When i am playing my mp3 ? Player not play the file ????
Please help

Ned, about 7 yrs, 2 mths ago
Sunday September 19, 2010 2:28 PM

That's correct, the player wouldn't have the cookie required to be able to access the file. Basically, the player will be seen as trying to hotlink the file. No easy way around that with this method, you'd basically have to stop using this to protect those files if you want to play them in third-party players outside the browser.

Comment on this article (no HTML, max 1200 characters):