Including Link Vault from SSI

September 13th, 2005

When I wrote my previous post about Including Link Vault from Languages other than PHP, I wasn’t aware of a special quirk about Server Side Includes(SSI).

The #include virtual SSI element can be used to include the results of a program executed via a CGI interface. In this case, we use a PHP program. When SSI makes a sub-request to execute the program, it changes all the server variables to reflect the new request except REQUEST_URI. Since the Link Vault client software prefers to grab the url key from the REQUEST_URI, we can use this to our advantage.

Place the following code in a file called wrapperssiXXXXXXXXX.php in the same directory as your lvXXXXXXXXX.php file. Replace the X’s with your LV security code.

< ?php
// This is meant to be called via an #include virtual using Server Side Includes(SSI)
// The LV client software gives preference to the REQUEST_URI server variable for determining the correct url key.
// Since SSI leaves REQUEST_URI set to the url of the original SSI page instead of the #include virtual url,
// all that's needed is to include the LV client software and call the DisplayLinks() function.
require_once('lv7fzcWOCj8QjmHZH5.php');
print DisplayLinks(5,'',' - ','','');
exit;
?>

To use wrapperssiXXXXXXXXX.php use the following instruction:

<!–#include virtual=”/wrapperssiXXXXXXXXXXX.php” –>

This is a link to an SSI page with Link Vault that uses wrapperssiXXXXXXXXXXX.php to include links.

The wrapper code I created in my previous post, is a general solution for all languages. It accepts the correct Link Vault url key in the LINK_VAULT_UrlKey query string variable. This is necessary because when you include the output of a CGI program in other languages like Perl, REQUEST_URI is updated to reflect the url of the CGI program, not the original Perl script that called it.

Including Link Vault From Languages Besides PHP

June 11th, 2005

Edited 9/13/05

See my post on a better method of including Link Vault from SSI.

====

This should work to include the Link Vault client software from virtually any language including SSI.

Place this code in a file called wrapperXXXXXXXXX.php in the same directory as your lvXXXXXXXXX.php file. Replace the X’s with your security code.

<?php
// This is meant to be called via an HTTP request from languages besides PHP
// Assuming you want to display links on http://yoursite.com/mypage.html
// the url to call this file should be:
// http://yoursite.com/wrapperXXXXXXXXX.php?LINK_VAULT_UrlKey=/mypage.html
// LINK_VAULT_UrlKey should be the equivilent of PHP's $_SERVER['REQUEST_URI']
// It should have the full path and query string minus the domain. It starts with a '/'
if (!isset($_REQUEST['LINK_VAULT_UrlKey'])) {
print 'URL Key not provided';
exit;
}
$_SERVER['REQUEST_URI']=$_REQUEST['LINK_VAULT_UrlKey'];
require_once('lvXXXXXXXXX.php');
print DisplayLinks(5,'',' - ','','');
exit;
?>

To use wrapperXXXXXXXXX.php with SSI use the following instruction:

<!--#include virtual="/wrapperXXXXXXXXXXX.php?LINK_VAULT_UrlKey=${REQUEST_URI}" -->

This is an example page using SSI.

Standards Compliant Solution for 302 Web Page Hijacking

May 30th, 2005

The HTTP Content-Location header is a standards compliant solution for the 302 web page hijacking problem. Depending on how it’s evaluated, it can be effective as either an HTTP header, or as an HTML meta tag. If two different locations provide conflicting Content-Location data, then priority should be given to the information that was most likely generated by the person that controls the content.

It’s only effective if the Content-Location header from the domain that actually serves the content overrides any Content-Location header served with a redirect.

For example:

Googlebot finds a 302 redirect on hijacker.com that points to a page on v1.magicbeandip.com. Hijacker.com includes a Content-Location header pointing to a url on hijacker.com with the redirect, trying to fool Googlebot into thinking that the content actually does belong to them.

Googlebot follows the redirect to the page on v1.magicbeandip.com. My server actually serves the page content, but also includes a Content-Location header that points to a url on v1.magicbeandip.com. Since my server actually served the content, it’s more likely that my Content-Location header is correct, so my header should be used to determine the canonical url of the content.

Virtually all existing web pages don’t currently have a Content-Location header, so Googlebot can’t allow the canonical url of pages that don’t contain a Content-Location header to be changed by other domains. A Content-Location header should only be able to successfully specify a different canonical url if both urls provide a Content-Location header pointing to the same place. So in order for content from v1.magicbeandip.com to be canonicalized to hijacker.com, both domains have to send Googlebot Content-Location information pointing to the same place.

HTML Content-Location Meta Tag

In order to be effective, the Content-Location data from the person that controls the content must be used. Following this train of thought, it’s logical to recognize Content-Location data from an HTML meta tag as well.

<meta http-equiv="Content-Location" content="http://v1.magicbeandip.com/mycontent.html">

Since it’s very easy for the person controlling the content to create a meta Content-Location tag, this information should be given the higher priority than a Content-Location header served by either the same or different domains. And again, it can’t specify a url on a different domain unless both domains agree. If hijacker.com includes a Content-Location meta tag with a Refresh meta tag, then my domain must provide the same Content-Location information in order for it to be successful.

It Works, But There’s an Easier Way

Specifying the full url with a Content-Location header works, but adds another level of complexity to managing websites. In order to simplify things, I suggest creating a way to specify only the domain the content belongs to, instead of the full url. Determining the correct canonical url would be left up to Googlebot as long as the result is in the specified domain.

For example you could use an “X-Content-Domain” header and meta tag that would be prioritized the same way as I’ve outlined for the Content-Location header. The intent being to give priority to the information that was most likely provided by the person in control of the content.

If I had to choose one or the other I think I would choose X-Content-Domain over Content-Location.

Update to Link Vault Setup Software

May 30th, 2005

Updated 9/13/05:

Link Vault Client Software Beta Test

Download lvSecurityCode.zip version 1.4.01-5122117.

Make backup copies of your XXXXXXXXX.txt data file and lvXXXXXXXXX.php files before installing this version. It changes the data file format so the backups will make switching back (if needed) to the old version much easier.

Follow these instructions at the top of the file:
// Change ##SecurityCode## to match the Link Vault security code for your server.
// Change ##SiteUrl## to match the "Site Url" you specified for this site in your Link Vault settings.
// Change ##lvFolder## to match the "Script Folder" you specified for this site in your Link Vault settings.
// If you haven't specified a Script Folder in the Link Vault settings, change it to ''.

Replace your existing lvXXXXXXXXX.php with this one.

Optionally create a file called lvXXXXXXXXXlog.txt with correct permissions in the same directory as your data file and errors will be written to the file regardless of test mode state. The file length is truncated to 25K when it reaches 50K in size.

To get it to reset your data file, you will have to access a web page twice. It should start displaying links on the third access.

Change Log:
Release 5122117

  • BUG: The php filesize() function is cached and doesn’t always return the actual file size. filesize() was being used to determine how many bytes to read from the VarFile. If other processes increased the filesize before this process got a file lock, the end of the VarFile was being truncated, causing a checksum error.
  • FIX: 102400 bytes are always added to the length reported by filesize() to ensure that the EOF is reached.
  1. Put newurl and newurlbatch actions back
  2. Added LINK_VAULT_DefaultMaxPages constant for installs with more than 5000 pages
  3. Put the php processing instruction end bracket back
  4. Added LINK_VAULT_Debug constant. Debug mode is on when defined, regardless of value.
  5. Added VarFileLock() and VarFileUnlock()
  6. Added VarFileRead() and VarFileWrite()- These verify the file is locked. All VarFile I/O uses these functions
  7. Added _VarReadConvertShortData() Remove redundant code in _VarReadAll & _VarReadShort
  8. Added _VarSaveShortDataWrite() Remove redundant code in _VarSaveAll & _VarSaveShort

Release 5091311

  1. Client errors are now only logged to the optional log file and returned to LV server requests. They are never displayed on a page load regardless of the Test Mode setting.
  2. php error_reporting() is set to 0 during the DisplayLinks() function.
  3. Added utf-8 logic provided by expat. Call DisplayLinksUtf8() instead of DisplayLinks() to detect and convert iso links to utf8.
  4. Removed some of the info displayed by the getinstallinfo action.
  5. FindFile() Fixed problem with fallback second search loop used when LINK_VAULT_Folder != ‘’
  6. Set ignore_user_abort() to true when writing to data file to prevent possible corruption issues.
  7. Changed log file size limit to 50K and truncates to 25kb.
  8. Removed the php end processing bracket (’?’ followed by ‘>’) at the end of this file to eliminate problems with blank lines following the bracket being introduced during installation.
  9. UrlFindKey() The url is now listed in the error log when invalid urls are found.
  10. $LINK_VAULT_Object is now unset upon completion of ServerLoop()
  11. Removed newurl and newurlbatch actions. They are no longer used by LV servers.

Release 5090510

  1. _parse_str() and UrlFindKey() were adjusted to differentiate between empty vars with and w/o an ‘=’. A query string of ?var1=&var2 was being reported as ?var1&var2
  2. Test Mode now defaults to off to eliminate error message signature.

Release 5090207

  1. VarFileTestId() - If a RESET_ME attempt fails, a delay is observed until the next attempt. This prevents LV servers from being overloaded with “resetsite” requests. An error message is now displayed to indicate that this delay was being observed. Previously, VarFileTestId() failed without explaination in this case.

Release 5090117

  1. LINK_VAULT_ShutdownResetMe() - Now correctly recognizes LINK_VAULT_Folder.

Release 5083114

  1. An empty string and ‘Site Requires Activation’ are now a valid replies from the LV server for givemealllinks, givemeallplacements and givemeallurls. This fixes a problem with freshly added sites.
  2. Made an attempt at friendlier error messages in VarFileTestId()

Release 5082915

  1. The RESET_ME process, used when the varfile is corrupt or when updating from an old data file version, has been redesigned.
    • VarFileTestId() now uses register_shutdown_function() to start a separate lv*.php process to perform the site reset
    • LINK_VAULT_SiteUrl contains the domain used to access lv*.php and is used by the shutdown function.
    • The time is saved with the RESET_ME tag when a reset is initiated.
    • $this->InitiateResetDelay seconds must pass after a failed RESET_ME attempt before trying again.
    • TestMode defaults to off after the RESET_ME process.
  2. ActionGetInstallInfo() displays several new parameters.
  3. ServerLoop() - Added ignore_user_abort() and now attempt to set max_execution_time to 900.
  4. Removed LastResetInitiated from VarFile (Now stored with RESET_ME tag)
  5. Changed var name $isResetWanted to $isResetTagWanted

Release 5071810

  1. Varfile now stays locked between time when the ‘RESETME’ tag is discovered and the VarFile reset is complete
  2. Added UrlKey validity checks. Excluding the quotes: UrlKey must start with “/”, must not contain “//”,”/../”,”/./”,”#” and must not end with “?”,”/.”

Release 5062022

  1. Changed InitiateResetDelay from 15 minutes to 5 minutes
  2. Changed ShortDelayNormal from 1 minute to 5 minutes
  3. MaxPages and TestMode can be set with a query string variable when calling the following actions: getnewurls, newurl, newurlbatch, resetsite, settestmode

Release 5061807

  1. Url count wasn’t getting updated properly in ActionNewUrl() and ActionNewUrlBatch() - fixed.
  2. Made allowances for magic_quotes_gpc and magic_quotes_runtime.

Release 5061613

  1. Added detection of failed connection to LV servers when using fsockopen().
  2. Added ‘getinstallinfo’ action to display basic diagnostic information about the client software installation.
  3. Place the error log file in the same directory as your data file. lv*.php now uses the same logic to search for the data file and log file.
  4. The error log file is now truncated to 15Kbytes when it reaches a size of 30Kbytes.

Release 5061422

  1. The version & release numbers are now included with every request to LV servers.
  2. More updates to query string handling.

Release 5061314

  1. Fix bug involving query strings in UrlKey

Release 5061307

  1. Added a check for a valid HTTP status code int the reply from the LV server when using fsockopen() in HttpOpen()
  2. fixed bug that prevented ActionGiveMeAllNewUrls() from being called
  3. New urls are now removed from $this->UrlListNew when the LV server returns them instead of when they are sent to LV
  4. Added sanity checks for data recieved from LV for some of the givemeall* requests.

Release 5061220

  1. Bug fix regarding urls with query strings.

Release 5061122

  1. New file format allows faster short data reads and writes. The entire data file is read/written only if necessary.
  2. Session Id is stripped from urls instead of refusing the url completely.
  3. Url fragments (after a #) are stripped from urls.
  4. Diagnostic error messages are display on screen if test mode is on. No errors are displayed with test mode off
  5. Create a file called lvXXXXXXXXXlog.txt with correct permissions in the default directory and errors will be written to the file regardless of test mode state. The file length is randomly truncated to 20k.
  6. Automatically does a resetsite action if corrupt data is detected. This is done in two page loads: #1 write ‘RESET_ME’ over beginning of file. #2 if ‘RESET_ME’ is found at the beginning of the file, a resetsite action is performed. This insures that the file is truely writable before data is requested from LV servers.
  7. Supports a ‘newurlbatch’ action when implimented by LV servers. Instead of getting one url at a time from LV servers, several new urls can be downloaded in a single batch.
  8. getscriptversion action now returns the release number if ‘release’ is set in the query string

Link Vault Setup Code Update 1.3.06-5052709

May 27th, 2005

This is a beta release 1.3.06-5052709 of the php Link Vault setup code that uses the fsockopen() function instead of the fopen() function to access the Link Vault server.

To try it out, you’ll have to edit the following two lines of code at the top of the file to include your Link Vault settings, and change the filename to match your security code. Don’t forget to change the filename extension from .phps to .php

// Change ##SecurityCode## to match the Link Vault security code for your server.
define('LINK_VAULT_Code','##SecurityCode##');
// Change ##lvFolder## to match the "Script Folder" you specified for this site in your Link Vault settings.
// If you haven't specified a Script Folder in the Link Vault settings, change it to ''.
define('LINK_VAULT_Folder','##lvFolder##');

Download lv1.3.06-5052709.phps

Edited 5/30/05 - Updated to newer version.

Deleting Expired Announcements in phpWebsite

May 8th, 2005

phpWebsite announcements have an expiration date used to determine when to stop displaying them. After they stop displaying, the announcements simply remain in the database until they are deleted manually. This is my solution to delete expired announcements automatically.

Rather than include something in the phpWebsite code that checks for expired records every time the announcements module is loaded, I created a separate file to execute once per day via my server’s crontab. It would have been simplest to just do a few sql queries to delete the records directly from the database. But, I’m not that familiar with the database structure and relationships, so it would have required a lot of research to make sure I was doing it correctly.

Instead, I copied code from existing phpWebsite files to create a separate execution loop to use the announcements module to delete the announcements. I ended up combining code from /index.php, /mod/announce/class/Announcement.php and /mod/announce/class/AnnouncementManager.php.

AnnDeleteExpired.phps is the result of my efforts.
AnnDeleteExpiredCron.pl.txt is the perl script I use to call it from my crontab.

AnnSyncWithFatCat.phps was my first experiment with this method. It is the same as pressing the “Sync With FatCat” button in the announcements module.

All of these files are for use with phpWebsite 0.9.3-4.