<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>irama.org &#187; PHP</title> <atom:link href="http://irama.org/news/category/technology/web/development/php/feed/" rel="self" type="application/rss+xml" /><link>http://irama.org</link> <description>the web and I</description> <lastBuildDate>Tue, 08 Nov 2011 11:00:36 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.2.1</generator> <item><title>Harvest the web</title><link>http://irama.org/news/2010/04/05/harvest-the-web/</link> <comments>http://irama.org/news/2010/04/05/harvest-the-web/#comments</comments> <pubDate>Mon, 05 Apr 2010 02:46:42 +0000</pubDate> <dc:creator>Andrew Ramsden</dc:creator> <category><![CDATA[PHP]]></category> <guid
isPermaLink="false">http://irama.org/?p=493</guid> <description><![CDATA[The last few weeks I&#8217;ve been trying to find ways to interact more easily with the Steam Community data that is exposed for all Groups and Users with public profiles. I was frustrated by the fact that Valve have not publicised an official API for interacting with this data and that the unofficial efforts failed [...]]]></description> <content:encoded><![CDATA[<p>The last few weeks I&#8217;ve been trying to find ways to interact more easily with the <a
href="http://steamcommunity.com/">Steam Community</a> data that is exposed for all Groups and Users with public profiles. I was frustrated by the fact that <a
href="http://forums.steampowered.com/forums/showpost.php?p=13668755&#038;postcount=1">Valve have not publicised an official API</a> for interacting with this data and that the <a
href="http://steamcommunity.com/groups/scapi">unofficial efforts</a> failed to meet the scope I was looking for &mdash; not to mention being badly broken due to changes to the HTML of the target website.</p><p>My initial thought was to follow a model similar to this <a
href="http://code.google.com/p/steamcommunityapi/">new project</a>. But this approach leaves a number of common scraping problems unresolved:</p><ol><li>No caching. Each time data is required, the code will request the source HTML from the target URL</li><li>Linear performance. Each time data is required, the code must process the HTML into API objects</li><li>Relies on well-formed XML. If PHP&#8217;s SimpleXML extensions receives tag-soup the solution will fail</li><li>Complex code to maintain. When the target website changes the structure of their HTML, it means a complete re-write of the majority of the API code</li></ol><h2>Enter the Reaper</h2><p
class="feature-thumb alt"><a
href="/web/reaper/"><img
src="/assets/images/reaper/reaper.png" alt="" /></a></p><p>To address these issues, I have been developing <a
href="/web/reaper/">Reaper</a>. Currently a PHP implementation that doesn&#8217;t require any extensions or external libraries. Reaper attempts to condense the common tasks of scraping into small blocks of efficient code and cache the results transparently for best performance:</p><ol><li>Reaper requests the URL (via <a
href="http://developer.yahoo.com/yql/">YQL</a>). HTML returned is <a
href="http://www.w3.org/People/Raggett/tidy/">tidied</a> into well-formed XML and cached</li><li>Reaper accepts your data definition array which maps data labels to <a
href="http://www.w3.org/TR/xpath/">XPath</a> queries, <a
href="http://www.regular-expressions.info/">RegEx</a> expressions and/or <a
href="http://php.net/callback#language.types.callback">callback functions</a> to scrape the relevant data</li><li>Reaper caches the resulting data object and returns it to you</li></ol><p>There&#8217;s more work to do to improve error-handling and documentation, but so far I&#8217;m pretty pleased with the results.</p><p>Meanwhile, I&#8217;ve stumbled onto <a
href="http://developer.valvesoftware.com/wiki/Steam_Condenser">Steam Condenser</a>, so I may not need to roll my own Steam Community API after all <img
src='http://irama.org/wp/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p><p>I&#8217;m keen to hear suggestions and feedback, so let me know what you think <a
href="/news/2010/04/05/harvest-the-web/#respond">as a comment</a>, using the <a
href="/contact/">contact form</a> or <a
href="http://twitter.com/airama">on Twitter</a>.</p> ]]></content:encoded> <wfw:commentRss>http://irama.org/news/2010/04/05/harvest-the-web/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>My first public WordPress plugins</title><link>http://irama.org/news/2010/03/22/my-first-public-wordpress-plugins/</link> <comments>http://irama.org/news/2010/03/22/my-first-public-wordpress-plugins/#comments</comments> <pubDate>Mon, 22 Mar 2010 09:00:10 +0000</pubDate> <dc:creator>Andrew Ramsden</dc:creator> <category><![CDATA[iPhone]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[WP Plugins]]></category> <guid
isPermaLink="false">http://irama.org/?p=436</guid> <description><![CDATA[I&#8217;ve been playing with WordPress for a fair while now. Hacking together themes, trying and modifying existing plugins, writing my own simple plugins from time to time. Generally doing too much in the theme files, and not enough abstraction into proper plugins. Recently I&#8217;ve decided it&#8217;s time to bite the bullet and formalise some of [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve been playing with <a
href="/web/cms/wordpress/">WordPress</a> for a fair while now. Hacking together themes, trying and modifying existing plugins, writing my own simple plugins from time to time. Generally doing too much in the theme files, and not enough abstraction into proper plugins.</p><p>Recently I&#8217;ve decided it&#8217;s time to bite the bullet and formalise some of these hacks, so I bring you my first two <a
href="/web/cms/wordpress/plugins/">public WordPress plugins</a>:</p><ul><li><h3><a
href="http://irama.org/web/cms/wordpress/plugins/custom-default-avatar/">Custom default avatar</a></h3><p>Plain vanilla WordPress provides a list of default avatars to choose from, but doesn&#8217;t allow you to choose an image of your own making. This plugin allows you to specify your own default avatar</li><li><h3><a
href="http://irama.org/web/cms/wordpress/plugins/custom-app-icons/">Custom app icons</a></h3><p>This plugin allows you to specify icon(s) to be used when iPhone / iPod Touch users create a shortcut to your site using the &#8216;Add to Home Screen&#8217; function in Safari</li></ul><p>I hope you find these useful, and eventually I will think about submitting for inclusion in the <a
href="http://wordpress.org/extend/plugins/">WordPress.org plugin directory</a>. Before I do however, I&#8217;d appreciate any and all feedback, for example: any functionality limitations, plugin faux pas, coding style issues, etc&#8230;</p><p>Let me know what you think!</p> ]]></content:encoded> <wfw:commentRss>http://irama.org/news/2010/03/22/my-first-public-wordpress-plugins/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Slightly nicer URLs</title><link>http://irama.org/news/2009/10/17/slightly-nicer-urls/</link> <comments>http://irama.org/news/2009/10/17/slightly-nicer-urls/#comments</comments> <pubDate>Sat, 17 Oct 2009 05:32:23 +0000</pubDate> <dc:creator>Andrew Ramsden</dc:creator> <category><![CDATA[Identifiers]]></category> <category><![CDATA[PHP]]></category> <guid
isPermaLink="false">http://irama.org/?p=328</guid> <description><![CDATA[As we know, all unique online resources should be addressable with a unique URL. However, not all URLs were created equal. Some URLs are &#8220;nicer&#8221; than others. For example, URLs with query string parameters are often considered to belong to the &#8220;not so nice&#8221; URL category: http://example.com/?p=1234&#038;vH=10&#038;Session_ID=er5DKJn838JK2dfs In general, what I consider to be &#8220;nice&#8221; [...]]]></description> <content:encoded><![CDATA[<p>As we know, all unique online resources should be addressable with a unique URL.</p><p>However, not all URLs were created equal. Some URLs are &#8220;nicer&#8221; than others. For example, URLs with query string parameters are often considered to belong to the &#8220;not so nice&#8221; URL category: <samp>http://example.com/?p=1234&#038;vH=10&#038;Session_ID=er5DKJn838JK2dfs</samp></p><p>In general, what I consider to be &#8220;nice&#8221; or &#8220;not so nice&#8221; URLs is a lengthy topic, and I&#8217;ll only touch on part of it today. Suffice to say, that for some purposes, I believe using query string parameters is not the worst crime you can commit. In fact, in some cases, I believe they are perfectly acceptable.</p><p>Take the following URL for instance: <samp>http://example.com/books/?format=html&#038;order=alphabetical&#038;page=2</samp>. Although query string parameters mean this URL is a little tricky to read, at least it uses human-readable parameter keys and values. And because slashes <samp>/</samp> in URLs imply heirarchy, the only good alternative for this type of URL would be a <a
href="http://www.w3.org/DesignIssues/MatrixURIs.html">Matrix URL</a>, like this: <samp>http://example.com/books/;format=html;order=alphabetical;page=2</samp>.</p><p>Implementing Matrix URLs within web applications can be difficult, requiring extra server-side redirects or client-side trickery because by default, a HTML form won&#8217;t submit data formatted as a Matrix URL.</p><p>That&#8217;s why I believe query strings aren&#8217;t so bad, sometimes they really come in handy.</p><h2>Repeated parameters</h2><p>That said, when using checkboxes (or heaven-forbid) multi-select controls to submit data using the GET method, some server-side languages (like PHP) require that you add <code
class="xhtml">[]</code> to the end of the name attribute of each control, for example: <code
class="xhtml">&lt;input type="checkbox" name="items[]" value="item1" />&lt;input type="checkbox" name="items[]" value="item2" /></code></p><p>For my money, this results in &#8220;not so nice&#8221; URLs, for example: <samp>http://example.com/books/?items[]=item1&#038;items[]=item2</samp></p><p>I know it&#8217;s a subtle difference, but I much prefer: <samp>http://example.com/books/?items=item1&#038;items=item2</samp></p><p>The other benefit is that your HTML wouldn&#8217;t need to contain the <code
class="xhtml">[]</code> either: <code
class="xhtml">&lt;input type="checkbox" name="items" value="item1" />&lt;input type="checkbox" name="items" value="item2" /></code></p><h2>A problem</h2><p>The problem is, by default, if <code
class="xhtml">[]</code> doesn&#8217;t appear in your URLs, only the last &#8216;items&#8217; parameter will be accessible to PHP in the <code
class="php">$_GET</code> array.</p><h2>A solution</h2><p> I spent some time thinking about this, and decided the best thing to do would be to parse the URL myself.</p><pre><code class="php">/**
 * Returns query string parameters more intelligently from the URL than by using the $_GET array.
 *
 * When multiple parameters are encountered with the same name, they are stacked into an
 * array. This means all URL data can be accessed without using brackets in name attributes
 * For example, typically you would use: &lt;input name="items[]" /> resulting in &#038;items[]=id1&#038;items[]=id2
 * However, using this method you can use: &lt;input name="items" /> resulting in &#038;items=id1&#038;items=id2
 *
 * @author Andrew Ramsden
 * @see: http://irama.org/news/2009/10/17/slightly-nicer-urls/
 * @license GNU GENERAL PUBLIC LICENSE (GPL) &lt;http://www.gnu.org/licenses/gpl.html>
 *
 * @param String $url (optional) A URL to parse for query string variables. If not set, the
 *        current requested URI will be parsed.
 * @return Array An associative array with all query string variables. Multiple parameters
 *         are stacked into a nested array.
 */
function getURLVariables ($url='') {
	$url = !empty($url) ? parse_url($url) : parse_url($_SERVER['REQUEST_URI']);
	$result = array();
	$queryStrParams = explode('&#038;',$url['query']);
	foreach ($queryStrParams as $param) {
		$paramKeyVals = explode('=',$param, 2);
		if (!isset($paramKeyVals[0])) continue;
		$key = $paramKeyVals[0];
		$val = isset($paramKeyVals[1])?$paramKeyVals[1]:'';
		if (substr($key,-6) == '%5B%5D') { // support ugly urls too
			$result[substr($key,0,-6)][] = $val;
		} else if (!isset($result[$key])) { // add new param to the results array
			$result[$key] = $val;
		} else { // this param already exists, stack into an array
				if (is_array($result[$key])) {
					$result[$key][] = $val; // add to existing array
				} else {
					$result[$key] = array($result[$key], $val); // create new array
				}
		}
	}
	return $result;
}</code></pre><p>Now instead of using: <code
class="php">$items = $_GET['items'];</code> you can use <code
class="php">$items = getURLVariables()['items'];</code> and access all the data from your <em>slightly nicer URLs</em>.</p><p>Feedback appreciated, let me know what you think.</p> ]]></content:encoded> <wfw:commentRss>http://irama.org/news/2009/10/17/slightly-nicer-urls/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>To die if necessary (weapons of mass abstraction)</title><link>http://irama.org/news/2006/04/18/to-die-if-necessary/</link> <comments>http://irama.org/news/2006/04/18/to-die-if-necessary/#comments</comments> <pubDate>Tue, 18 Apr 2006 11:00:02 +0000</pubDate> <dc:creator>Andrew Ramsden</dc:creator> <category><![CDATA[General]]></category> <category><![CDATA[Movies]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Web]]></category> <guid
isPermaLink="false">http://tinymac.com/?p=40</guid> <description><![CDATA[I&#8217;m sure I&#8217;m not the first to notice this: From the PHP manual: &#8230;Also note that it is your responsibility to die() if necessary&#8230; Obviously I missed the memo where PHP developers were sworn to a secret pact, where among other duties, we are required to perform acts of international espionage and die() if neccessary. [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;m sure I&#8217;m not the first to notice this:</p><blockquote><p><a
href="http://www.php.net/manual/en/function.set-error-handler.php">From the PHP manual</a>:</p><p>&#8230;Also note that it is your responsibility to <a
href="http://www.php.net/manual/en/function.die.php">die()</a> if necessary&#8230;</p></blockquote><p>Obviously I missed the memo where PHP developers were sworn to a secret pact, where among other duties, we are required to perform acts of international espionage and <em>die() if neccessary</em>.</p><p>I guess it came between the whitepapers <em>karate <a
href="http://www.php.net/manual/en/function.chop.php">chop()</a>: land the death blow</em> and <em>Rigged to <a
href="http://www.php.net/manual/en/function.explode.php">explode()</a>: Dangerous things 101</em>.</p> ]]></content:encoded> <wfw:commentRss>http://irama.org/news/2006/04/18/to-die-if-necessary/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
