<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TagWalk Blog</title>
	<atom:link href="http://blog.tagwalk.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.tagwalk.com</link>
	<description>Taking a sneeky peek into Twitter</description>
	<lastBuildDate>Wed, 01 Feb 2012 12:52:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Experimental Foursquare Recommendations</title>
		<link>http://blog.tagwalk.com/2010/08/experimental-foursquare-recommendations/</link>
		<comments>http://blog.tagwalk.com/2010/08/experimental-foursquare-recommendations/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 01:03:27 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[foursquare]]></category>
		<category><![CDATA[mashups]]></category>
		<category><![CDATA[recommendation]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=113</guid>
		<description><![CDATA[Foursquare and Locations After many months of operation, TagWalk has collected over 100 million tweets, and tracked over 40 million URLs. There are increasing numbers of Foursquare URLs popping up, and due to good RESTful design from the Foursquare team, I noticed that when users tweet their checkins, it is possible to pattern match their [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Foursquare and Locations<br />
</strong></p>
<p>After many months of operation, TagWalk has collected over 100 million tweets, and tracked over 40 million URLs. There are increasing numbers of Foursquare URLs popping up, and due to good <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">RESTful</a> design from the Foursquare team, I noticed that when users tweet their checkins, it is possible to pattern match their venue URLs.</p>
<p>In true mashup fasion, after a quick nosey through the <a href="http://groups.google.com/group/foursquare-api/web/api-documentation">Foursquare API documention</a>, I was quickly able to fetch additional meta data about venues and present them nicely in the links section. In the same way image thumbnails are shown TwitPic and the other twitter photo services.</p>
<p><img class="alignnone size-full wp-image-135" title="foursquare-locations" src="http://blog.tagwalk.com/wp-content/uploads/2010/08/foursquare-locations2.png" alt="" width="420" height="204" /></p>
<p><strong>I ♥ Recommendation Engines</strong></p>
<p><strong> </strong>I am a massive fan of recommendation engines. One of my goals  for building TagWalk was to create a recommendation engine that would  suggest <a href="http://blog.tagwalk.com/2010/08/user-interest-ego-vs-reputation-trust/">users with the most reputation</a> for particular subjects or #hashtags, and show popular  links being tweeted by users. My inspiration is probably the oldest and  most successful example I can think of, Amazon&#8217;s <em>&#8220;Customer who bought this also bought&#8230;&#8221;</em></p>
<p><em><img title="Amazon Recommendation" src="http://blog.tagwalk.com/wp-content/uploads/2010/08/amazon-recommendation.png" alt="" width="400" height="157" /></em></p>
<p><strong>Location-based Recommendations<br />
</strong></p>
<p>Treating Foursquare URLs as  symbolic locations instead of web-pages  unlocks lots of new meaning burried in Twitter data. It makes location based analytics and  recommendations possible. By crunching some historic data it was possible to  create a very crude location recommendation engine, or rather,<em> &#8220;people who went here, also went there&#8221;</em></p>
<p>Some locations of interest and their related <em>&#8220;people who went here, also went there&#8221;</em> recommendations are: <a href="http://tagwalk.com/foursquare/12238">San Francisco Airport (SFO) ✈</a>, <a href="http://tagwalk.com/foursquare/49547">Twitter HQ</a>, <a href="http://tagwalk.com/foursquare/128530">Foursquare HQ</a>, <a href="http://tagwalk.com/foursquare/47783">Facebook HQ</a>, <a href="http://tagwalk.com/foursquare/46077">Tech Crunch HQ</a> , <a href="http://tagwalk.com/foursquare/65791">Y Combinator</a>, <a href="http://tagwalk.com/foursquare/3945">The White House</a>, and the San Francisco <a href="http://tagwalk.com/foursquare/20088">Apple Store</a>.</p>
<p><strong>Sample Size<br />
</strong></p>
<p>Foursquare recently trumpeted their <a href="http://twitter.com/foursquare/status/19026739371">100,000,000th checkin</a>. This dataset covers less than 0.2% of this; just over 150,000 checkins and approximately   80,000 locations. This is a large enough sample size to see some   interesting relationships start to emerge. As the population of Twitter users that TagWalk follows has a statistical bias to San Francisco and other tech areas, it   figures that these tech-savvy areas have better coverage by my data set than others areas.</p>
<p><strong>What next?</strong></p>
<p>Whilst the data set is not (and can never be) complete, a large enough sample size can yeild &#8220;good enough&#8221; results. By selectively loading more data, the data set can be adapted to improve the quality of results. Over time, TagWalk will continue to collect small volumes of checkins, but specific search filtering could by used to prioritise certain regions or businesses.</p>
<p>With further digging around in the datamine, it is possible to produce location-based  recommendations, or rather,<em> &#8220;people who went here, also went there&#8221;</em> and for more considerable computation effort, <em>&#8220;people who went here, also have the following interests, reputation, talk with these users, and visit these web sites.&#8221;</em></p>
<p><em> </em>If you run a business, particularly with many locations and would like to work with me to get to know your customers better, then please get in touch.</p>
<p>You can follow me here: <a href="http://twitter.com/timhastings">@timhastings</a></p>
<p><em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2010/08/experimental-foursquare-recommendations/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reputation (trust) vs. User Interest (ego)</title>
		<link>http://blog.tagwalk.com/2010/08/user-interest-ego-vs-reputation-trust/</link>
		<comments>http://blog.tagwalk.com/2010/08/user-interest-ego-vs-reputation-trust/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 11:12:06 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[ego]]></category>
		<category><![CDATA[reputation]]></category>
		<category><![CDATA[trust]]></category>
		<category><![CDATA[wisdom of crowds]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=105</guid>
		<description><![CDATA[When trying to find interesting people online, it is useful to be able to differentiate between somebody who is very interested in a particular subject from somebody who has a good reputation for that subject. In the example below, there is a very shy but talented Elvis impersonator. By examining the tweets, we can see [...]]]></description>
			<content:encoded><![CDATA[<p>When trying to find interesting people online, it is useful to be able to differentiate between somebody who is very interested in a particular subject from somebody who has a good reputation for that subject.</p>
<p>In the example below, there is a very shy but talented Elvis impersonator.</p>
<p><img class="size-full wp-image-106 alignnone" title="Reputation from tweets at work" src="http://blog.tagwalk.com/wp-content/uploads/2010/08/messages-sm.png" alt="" width="400" height="163" /></p>
<p>By examining the tweets, we can see three cases where &#8220;ronnie&#8221; is mentioned with Elvis, yet he himself never mentions Elvis at all.</p>
<p>Here&#8217;s another illustration of the difference, who is failing and who is a troll?</p>
<p><img class="alignnone size-full wp-image-107" title="Interest (ego) vs Reputation (trust)" src="http://blog.tagwalk.com/wp-content/uploads/2010/08/interest-vs-reputation.png" alt="" width="451" height="181" /></p>
<p>In the title of this post, I&#8217;ve related interest and reputation to ego and trust. Going off what an individual says only takes one voice into account, their ego. Many services use follower count as a messure of reputation, but in my opinion, most importantly, conversation matters! Listening to what many people say gives a more trustworthy answer.</p>
<p><a href="http://tagwalk.com/">TagWalk</a> associates what people are saying in their tweets (interest) and who they are saying it to (passing reputation along). This means that TagWalk can make recommendations on a certain hashtag or word. In doing so, it also derives relationships between words, hashtags and URLs. Allow you to browse through the relationships discovering new people and content.</p>
<p>Here are some example hashtags to explore: <a href="http://tagwalk.com/tag/followfriday">#followfriday</a>, <a href="http://tagwalk.com/tag/socialmedia">#socialmedia</a>, <a href="http://tagwalk.com/tag/wine">#wine</a>, <a href="http://tagwalk.com/tag/nodejs">#nodejs</a>, and <a href="http://tagwalk.com/tag/beer">#beer</a>.</p>
<p>You can also explore an individuals reputation by checking out their own pages. Here are the pages of some prominent Twitter users: <a href="http://tagwalk.com/user/BillGates">@BillGates</a>, <a href="http://tagwalk.com/user/scobleizer"><span class="label"><span class="number">@Scobleizer</span></span></a>, <a href="http://tagwalk.com/user/guykawasaki">@guykawasaki</a>, <a href="http://tagwalk.com/user/pistachio">@pistachio</a> and <a href="http://tagwalk.com/user/arrington">@arrington</a>.</p>
<p>There is wisdom in them thar crowds!</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 72px; width: 1px; height: 1px; overflow: hidden;">http://tagwalk.com/tag/winewine</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2010/08/user-interest-ego-vs-reputation-trust/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter User Reputation Computed from Tweets</title>
		<link>http://blog.tagwalk.com/2009/11/twitter-user-reputation-computed-from-tweets/</link>
		<comments>http://blog.tagwalk.com/2009/11/twitter-user-reputation-computed-from-tweets/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 00:28:31 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[recommendation]]></category>
		<category><![CDATA[reputation]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[wisdom of crowds]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=82</guid>
		<description><![CDATA[The Twitter ecosystem has lots of different reputation mechanisms at the moment and I would like to discuss these and then make my own contribution. User Generated Directories There are directories like wefollow.com where users add themselves by choosing which categories they belong. This is effective, but I see there are some flaws with this [...]]]></description>
			<content:encoded><![CDATA[<p>The Twitter ecosystem has lots of different reputation mechanisms at the moment and I would like to discuss these and then make my own contribution.</p>
<p><strong>User Generated Directories </strong></p>
<p><strong></strong>There are directories like <a title="WeFollow" href="http://wefollow.com/">wefollow.com</a> where users add themselves by choosing which categories they belong. This is effective, but I see there are some flaws with this approach for use as a reputation system:</p>
<ul>
<li>The user&#8217;s motives for listing themselves in the directory is self-promotion.</li>
<li>It is open to bogus claims. For example: I could tag myself as an #seo expert when I may know very little on the subject.</li>
<li>Relevance and ranking. What determines which user is higher in a category than another user? Connectedness and number of followers can be used, but my single-malt whiskey followers also boost my rank in the PHP developers category.</li>
<li>Staleness. The oldest users may well have the most followers and influence. How can a new-comer hope to gain rank?</li>
</ul>
<p><strong>Twitter Lists</strong></p>
<p>Fairly new on the scene is Twitter Lists. This feature of Twitter allows users to build arbitrary public and private lists and add users to them. This could be viewed as tagging by another name. This is a great usability enhancement to the Twitter service, and they make it much easier to partition friends into subject groups. I imagine a valuable side-effect (and end-game) for Twitter is a new metric: <em>how many lists does a user belong to? </em>Or alternatively<em> &#8220;how many people think this user is <span style="text-decoration: underline;">worth</span> adding to a list?&#8221;</em> Emphasis on &#8220;worth&#8221;, as in, how many users value this user?</p>
<p>It is early days for Twitter Lists. They are clearly a usability win! Whether they get used to build a reputation system is yet to be seen, but I think they are definitely helpful in when screening for news feed bots or spammy accounts who are unlikely to appear on any one&#8217;s lists.</p>
<p>My concern around any reputation system built around Twitter Lists is that it costs very little effort to add a user to a list, merely the motivation to do it. Also <strong>list membership is black and white;</strong> a user cannot be &#8220;very in a list&#8221; or &#8220;a little bit in a list&#8221; &#8211; a user either belongs to a list or does not. Michael Gray posted about <a href="http://www.wolf-howl.com/seo/twitter-lists-orm/">How to Use Twitter Lists To Create Reputation Management Problems</a> which illustrated how a simple act of adding a user to a list can taint a user&#8217;s reputation.</p>
<p>Update 14-Jan-2010: <a href="http://www.mustexist.com/list_tags">MustExist&#8217;s List Tag</a> is a working reputation systems which derives from reputation from list membership for a given user name. Given a user&#8217;s screen name, it processes their list memberships and produces a tag cloud. For example: this is what is says about <a href="http://www.mustexist.com/list_tags/timhastings">timhastings</a>. This is good example of list-based reputation, if I was to suggest an improvement, it would be to allow the discovery of users within any given #hashtag. Thanks to <a href="http://twitter.com/eric_andersen">@eric_andersen</a> for the tip.</p>
<p><strong>Wisdom of Crowds</strong></p>
<p>In my opinion, a good reputation system should be derived from user activity and the relationships a user has with other users. I want a system which observes Twitter activity and then <strong>auto classifies </strong>users based on <strong>evidence</strong>. Each time somebody talks to me and uses a particular tag, it should increase my score for that tag. The system should be able to differentiate somebody who just talks a lot (self promotion) from somebody who is mentioned a lot (reputable). The number of different talkers using a tag, defines the size of that community.</p>
<p style="text-align: center;"><em>Reputation emerges </em><em>from monitoring Twitter activity and aggregating statistics.<br />
</em></p>
<p><em></em>Many <a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds">Wisdom of Crowds</a> systems are well known and very successful. The two best examples I can think of are Amazon&#8217;s Recommendation Engine and the Delicious Bookmarking system.</p>
<p>Amazon aggregates sales information, and computes things shopper A would also like to purchase knowing what users with similar purchase histories have also bought. Delicious is far simpler, it relies on users &#8216;tagging&#8217; bookmarks; if lots of users tag a bookmark as &#8216;webdesign&#8217;, then it probably has a lot to do with &#8216;webdesign&#8217;.</p>
<p>The Delicious model is the closest fit for a Twitter Reputation system. With Delicious, the aggregate value is gained from individuals selfish desire to organise their bookmarks. In our Reputation system, the aggregate value is derived from individuals selfish desire to communicate, it is given a turbo boost when they use #hashtags, hyperlinks and the user names of their friends.</p>
<p><strong>Demonstrating this Approach</strong></p>
<p>This blog is about a project of mine to build a great <strong>Reputation and Recommendation </strong>system for Twitter. <a href="http://tagwalk.com/">TagWalk</a> analyzes tweets and keeps track of who said what to who. It maintains a data mine of relationships. Each relationship gets strengthened each time it sees a tweet. This is then available to explorer and discover online.</p>
<p>For example, if we take a well known Twitter phenomenon, <a href="http://tagwalk.com/tag/followfriday">#followfriday</a> and use this as a basis for computing reputation, we can see some leaders emerge, as well as related hash-tags, and words used.</p>
<p><img class="alignright size-full wp-image-88" title="Reputation based on #followfriday" src="http://blog.tagwalk.com/wp-content/uploads/2009/11/followfriday1.png" alt="Reputation based on #followfriday" width="446" height="299" /><br />
Our metrics around the volume of tweets and number of talkers helps to define the size of the audience (or niche). It also shows how a new leader can quickly emerge.</p>
<p>I am also fairly certain that a new breed of SEO will evolve which will specialise in Social Media Reputation Optimization (SMRO).</p>
<p><strong>In Conclusion</strong></p>
<p>The number of friends, followers and how many lists you belong to are not enough to build a great reputation system. To be truly great, we must pay attention to the content and flow of the messages between users.</p>
<p>Wisdom of Crowds &#8211; it&#8217;s the only way to be sure.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/11/twitter-user-reputation-computed-from-tweets/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Incubation Over, Server Migration Complete</title>
		<link>http://blog.tagwalk.com/2009/11/incubation-over-server-migration-complete/</link>
		<comments>http://blog.tagwalk.com/2009/11/incubation-over-server-migration-complete/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 01:25:39 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[migration]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[restore]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=75</guid>
		<description><![CDATA[After 9 months sharing an Amazon EC2 instance with some of my other experimental projects it was finally time for TagWalk to &#8220;fly the nest&#8221;. The low spec virtual server was just not enough for TagWalk&#8217;s storage and data processing requirements. Looking at the economics of cost vs. performance, it made sense to pony up [...]]]></description>
			<content:encoded><![CDATA[<p>After 9 months sharing an Amazon EC2 instance with some of my other experimental projects it was finally time for TagWalk to &#8220;fly the nest&#8221;. The low spec virtual server was just not enough for TagWalk&#8217;s storage and data processing requirements. Looking at the economics of cost vs. performance, it made sense to pony up some money for a dedicated server as a new home for TagWalk.</p>
<p>Once the decision was made, it was very quick and painless to secure a new dedicated host. The hardest part was moving the TagWalk data mine from &#8216;the cloud&#8217; to a real server.</p>
<p>It has been an interesting exercise, and an excellent opportunity to validate many aspects of the design, identify weaknesses, and learn some interesting and painful lessons &#8212; every day&#8217;s a school day right?</p>
<p>Non-technical:</p>
<ul>
<li>Migration is an excellent rehearsal for disaster recovery.</li>
<li>Restoring a read-only mirror of the site was a good start goal.</li>
<li>Create a journal file with all your command snippets.</li>
</ul>
<p>Technical:</p>
<ul>
<li>Ubuntu Server 9.10 rocks.</li>
<li>DNS can be very useful. Subdomains are great.</li>
<li>Moving large volumes of data from one data center to another sounds daunting, but that was the easiest of jobs. The transfer rate was about 1.4MB/s and the generally speaking, the data could arrive faster than it could be dealt with.</li>
<li>Sharding was a good idea which pays huge dividends (never in doubt)</li>
<li>Waiting 20+ hours for a single table to restore really makes you question alternative methods of storage.</li>
<li>Mono 2.4.3.4 has some issues which required VB.Net stuff to require &#8220;Option Strict&#8221;</li>
</ul>
<p>MySQL Specific:</p>
<ul>
<li>MySQL imports can be *REALLY* slow. Under certain circumstances restoring large tables that have a UNIQUE INDEX can be very slow. I had heard anecdotal evidence but never really paid attention to it. Due to some rookie mistakes made along the way, it took me several attempts to restore my largest table. This problem alone set me back at least 2 days.</li>
<li>Giving MySQL as much memory as possible can considerable improve import times. Optimize MySQL for the task in hand.</li>
<li>MySQL&#8217;s <a href="http://dev.mysql.com/doc/refman/5.1/en/multiple-tablespaces.html">innodb_file_per_table</a> setting is the only way to go.</li>
<li>The Linux nohup command is the only way to run long MySQL restore operations.</li>
<li>Not using MySQL for the bulk of the data mine meant that for 70% of the data it was a zip-and-ship migration strategy. +1 for NoSQL!</li>
<li>Representing a MD5 digest as 4 unsigned integers is much more efficient than a 32 character string.</li>
<li>Restoring the data is easy, it is the indexes and constraint which take the time. A read-only version of your site can be up very quickly, the constraints are only necessary when you switch to read/write.</li>
<li>MySQL table switcheroos with the <a href="http://dev.mysql.com/doc/refman/5.0/en/rename-table.html">RENAME TABLE</a> command allows your site to use temporary copies of your tables which do not have all the indexes or use the right storage engine. The most effective large-table restore strategy I arrived at involved restoring MySQL tables into a MyISAM table without expensive indexes. Getting the site to use these for read only. Meanwhile, populate expensive versions of these tables with the full set of indexes (InnoDB + Unique indexes).</li>
<li>Some big MySQL operations cannot be canceled. If an script load or INSERT/SELECT dies then MySQL want to &#8220;close tables&#8221; of &#8220;clean up&#8221;, this can take forever. In many cases, this required server reboots, and MySQL remove/installs to overcome.</li>
</ul>
<p>The new server has more than double the memory which will give memcache and MySQL a much needed RAM massive boost. I also plumped for RAID as moving away from EC2&#8242;s Elastic Block Storage means a loss of redundancy on disk; a level of risk I am uncomfortable with, given the volumes of data</p>
<p>TagWalk has already settled into it&#8217;s new environment very quickly. The performance benefits are immediately apparent. As I begin to pay more attention I will be able to optimise this performance further. More importantly, the additional machine capacity will enable me to develop the next set of features.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/11/incubation-over-server-migration-complete/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image for Retweet or Share with Twitter Button</title>
		<link>http://blog.tagwalk.com/2009/07/image-for-retweet-or-share-with-twitter-button/</link>
		<comments>http://blog.tagwalk.com/2009/07/image-for-retweet-or-share-with-twitter-button/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 23:04:44 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[button]]></category>
		<category><![CDATA[share]]></category>
		<category><![CDATA[ui]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=68</guid>
		<description><![CDATA[After adding the &#8220;Share with Twitter&#8221; feature to TagWalk, I was not happy with the buttons available so I have modified the Twitter OAuth button. Please feel free to use it if you would like:]]></description>
			<content:encoded><![CDATA[<p>After adding the &#8220;Share with Twitter&#8221; feature to TagWalk, I was not happy with the buttons available so I have modified the Twitter OAuth button. Please feel free to use it if you would like:</p>
<p><img class="alignnone size-full wp-image-69" title="Retweet, Tweet This, or Share with Twitter" src="http://blog.tagwalk.com/wp-content/uploads/2009/07/share-with-twitter.png" alt="Retweet, Tweet This, or Share with Twitter" width="151" height="24" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/07/image-for-retweet-or-share-with-twitter-button/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>User Profiles Now Showing</title>
		<link>http://blog.tagwalk.com/2009/07/user-profiles-now-showing/</link>
		<comments>http://blog.tagwalk.com/2009/07/user-profiles-now-showing/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 23:01:00 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[twitter-api]]></category>
		<category><![CDATA[ui]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=61</guid>
		<description><![CDATA[After some recent work to TagWalk the site now shows user profiles. A local copy of the profiles are kept in the database and cached in memcache so page rendering is as fast as possible. Profiles are refreshed if a user posts or is mentioned in the tweets proceeds by the tweet processors and the [...]]]></description>
			<content:encoded><![CDATA[<p>After some recent work to <a href="http://tagwalk.com/">TagWalk</a> the site now shows user profiles. A local copy of the profiles are kept in the database and cached in memcache so page rendering is as fast as possible.</p>
<p><img class="alignnone size-full wp-image-62" title="Twitter User Profiles now displayed on TagWalk" src="http://blog.tagwalk.com/wp-content/uploads/2009/07/tagwalk-twitter-user-profiles.png" alt="Twitter User Profiles now displayed on TagWalk" width="487" height="328" /></p>
<p>Profiles are refreshed if a user posts or is mentioned in the tweets proceeds by the tweet processors and the profile held in the database is older than 7 days.</p>
<p>The other user information pulled from the <a href="http://apiwiki.twitter.com/">Twitter API</a> is useful for me to determine how many of the user&#8217;s tweets have been processed by TagWalk versus how many they have tweets altogether. At a later date, I can backfill a user&#8217;s tweets from the API, although rumour has it that the Twitter API only supports access to the <a href="http://groups.google.com/group/twitter-development-talk/browse_thread/thread/4b0344256f909b28?hl=en">last 3200-tweets of a user</a>.</p>
<p>So far, the tweet processors have fetched 382,000 user profiles. Which is nice.</p>
<p>In other news, I&#8217;ve also added a retweety button which allows a user to easily post a tweet for the page they are looking at. This is done by linking to the Twitter site and populating the message box.</p>
<p>I have more to be getting on with, but if you have any suggestions for TagWalk, please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/07/user-profiles-now-showing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter Statistics Now Showing on TagWalk</title>
		<link>http://blog.tagwalk.com/2009/06/twitter-statistical-now-showing/</link>
		<comments>http://blog.tagwalk.com/2009/06/twitter-statistical-now-showing/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 01:01:51 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[stats]]></category>
		<category><![CDATA[ui]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=47</guid>
		<description><![CDATA[This evening I have mostly been doing front end work on TagWalk, exposing the behind-the-scenes computed numbers. These numbers very useful, but previous attempts to display them had looked ugly. I took some inspiration from the way that Twitter displays their follower count information, and I am very pleased with the result. TagWalk breaks tweets [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-54" title="Twitter Statistics now showing on TagWalk" src="http://blog.tagwalk.com/wp-content/uploads/2009/06/twitter-statistics-tagwalk2.png" alt="Twitter Statistics now showing on TagWalk" width="359" height="280" /></p>
<p>This evening I have mostly been doing front end work on <a title="Statistical Analysis of Twitter" href="http://tagwalk.com/">TagWalk</a>, exposing the behind-the-scenes computed numbers. These numbers very useful, but previous attempts to display them had looked ugly. I took some inspiration from the way that Twitter displays their follower count information, and I am very pleased with the result.</p>
<p>TagWalk breaks tweets apart into things, and computes the relationship between the things that appear together in messages. Depending on what thing you are looking at, the statistics can be interpreted differently.</p>
<p>Like most statistics, they are open to interpretation, so I would like to explain what is shown, and most importantly, the meaning.</p>
<p><strong>Tweets:</strong> this is the sample size. This shows the number of tweets that have been processed and flagged with this term. Very importantly, TagWalk goes to a lot of trouble to ensure that any tweet is only counted once, so this figure does not include any duplicates. This is controlled by the Twitter allocated message ID (the one which is about to overflow in the <a href="http://www.techcrunch.com/2009/06/12/all-hell-may-break-loose-on-twitter-in-2-hours/">Twitpocalypse</a>). Duplicated user content in not filtered, so if a user does repost the same message, this will be counted. This is important to identify spammy users.</p>
<p><strong>Retweets: </strong>What percentage of these tweets were a retweet? For TagWalk, the definition of a retweet is whether the message begins with RT, or Retweet, or contains a &#8220;via username&#8221; pattern towards the end of the tweet.</p>
<p><strong>With Links:</strong> The percentage of the tweets which contained a link to a valid website. This is useful to identify whether a user or hashtag is particularly spammy. Especially when used in conjunction with the Web Sites metric.</p>
<p><strong>Hashtags: </strong>How many other hashtags have been used in conjunction with this thing?</p>
<p><strong>Talkers:</strong> This identifies the number of people talking about this thing. It is derived from the distinct users, so it is the population size of talkers.</p>
<p><strong>To Users:</strong> This is different from By Users, because it analyses the users who appear in the message body. It is useful to have this as a separate measure because it helps to determine the target or focus of the conversation or thing. A small number is a narrowly focused laser beam, versus a widely scattered meme.</p>
<p><strong>Web Sites:</strong> When links are included in tweets, TagWalk tries to determine the destination web site. This number shows the distinct host names of links (after redirects through URL shortening services). To try to get a more accurate figure, www is dropped from the host name.</p>
<p>For any stato boffin, sample size is very important. TagWalk statistics are based on a sample of the Twitter-sphere, and are not real time. The messages are fetched using &#8220;search seeds&#8221; which are used to hit the Twitter Search API, at the moment the stream APIs are not being used. If you would like to suggest seed terms or would like some comprehensive research, please get in touch.</p>
<p>I hope you find this statistics enhancement useful, any feedback or suggestions you may have would be very welcome!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/06/twitter-statistical-now-showing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Mining Twitter, a BarCamp Leeds talk</title>
		<link>http://blog.tagwalk.com/2009/06/data-mining-twitter-barcamp-leeds-talk/</link>
		<comments>http://blog.tagwalk.com/2009/06/data-mining-twitter-barcamp-leeds-talk/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 00:23:19 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[barcamp]]></category>
		<category><![CDATA[barcampleeds]]></category>
		<category><![CDATA[bcleeds09]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[recommendation]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=27</guid>
		<description><![CDATA[Last Saturday I attended the BarCamp Leeds and gave a talk about Data Mining Twitter to talk about the theory behind TagWalk and then gave a demonstration. After a shakey start with projector difficulties, the session got back on track and there was a good number of questions and recommendations.]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-33 alignnone" title="Data Mining Twitter, Tim Hastings, BarCamp Leeds 2009" src="http://blog.tagwalk.com/wp-content/uploads/2009/06/3581552670_c8915fa17a.jpg" alt="Data Mining Twitter, Tim Hastings, BarCamp Leeds 2009" width="350" height="263" /></p>
<p>Last Saturday I attended the <a title="BarCamp Leeds" href="http://barcampleeds.com/">BarCamp Leeds</a> and gave a talk about Data Mining Twitter to talk about the theory behind <a title="TagWalk: Data Mining Twitter" href="http://tagwalk.com/">TagWalk</a> and then gave a demonstration. After a shakey start with projector difficulties, the session got back on track and there was a good number of questions and recommendations.</p>
<p>In a nutshell, TagWalk is a Recommendation Engine which uses Twitter messages as its input. In the same way that Amazon suggests books you might like by examining the purchase history of other shoppers, TagWalk is able to recommend related hashtags, users and URLs.</p>
<p>Messages are decomposed into ‘tags’. These are not just explicit #hashtags, but also ordinary words, URLs, @usernames. For any given tag, TagWalk will show commonly used other tags, words, users and links.</p>
<p><strong>Motivation</strong></p>
<p>The motivation to build TagWalk came from <span style="text-decoration: line-through;">three</span> four main factors:</p>
<ul>
<li>A desire to retain links from messages with the hashtags &#8211; users of Delicious explicitly tag for themselves, whereas Twitter users are sharing links. These links must be worth keeping.</li>
<li>A love of recommendation engines &#8211; my day job doesn’t require building one, and I fancied building one as a side project.</li>
<li>A love of Twitter</li>
<li>An API must be called &#8211; an unexplored API is like a big red button &#8211; it has to be pushed.</li>
</ul>
<p>All in all, there has been about four months of late nights gone into TagWalk as a side project and there is still lots more to do.</p>
<p><strong>Computing Reputation and Authority<br />
</strong></p>
<p>Twitter directory sites such as WeFollow rely on users to tag themselves with hashtags. This is flawed. Just because I tag myself as #studly does not make it true. Likewise, I could post lots of messages containing #boffin as a hashtag, but this relates to the content of my message, and say little about me. What is very relevant however, is when other users use a username in a message alongside a hashtag — because these hashtags relate to that user.</p>
<p>For this reason, TagWalk differentiates the user who tweeted from a user mentioned in the tweet. When aggregated we are able to see hashtag experts emerging</p>
<p><strong>URL Shortening</strong></p>
<p>Whenever a URL is tweeted, it is usually shunk using a URL shortening service such as tinyurl. As many users may use a different service to shorten their links, TagWalk spiders the links to resolve the final URL. This also allows the hostname of the site to be included as an analysis tag.</p>
<p><strong>Architecture</strong></p>
<p>Unlike tradition LAMP stack applications, TagWalk is implemented using LAAMMMP. Which is Linux, Apache, Amazon EC2, MySQL, Mono, Memcache, and PHP. The algorithms and data structures behind the scenes are inspired by map/reduce, and will enable distribution across many servers (if it can generate income to pay for them)</p>
<p><strong>Challenges Ahead<br />
</strong></p>
<ul>
<li>Increasing the data processing capacity. At present seeds are used to focus attention, instead a shift towards the real time hose-feeds would give a fairer slant.</li>
<li>Bootstrapping and monetization. TagWalk runs on Amazon EC2 infrastructure which has costs money to run. Whilst it is a labour of love, generating income would enable more computing horsepower and therefore more features and bigger data sets.</li>
<li>Boring and stale links. As the site nears its second month of unattended operations, the link sorting algorithms have already needed tweaking to ensure a balance of popular and fresh links. There will be little incentive to return to the site if the same URLs are always at the top.</li>
<li>Spam. There are a lot of spam tweets out there, but fortunately I have the stats and algorithms to find them</li>
</ul>
<p><strong>Credits</strong></p>
<ul>
<li>Thanks to <a href="http://twitter.com/mikenolan">@MikeNolan</a> for the photograph (via <a title="Michael Nolan" href="http://www.flickr.com/photos/mikenolan/3581552670/">flickr</a>)</li>
<li>Thanks to <a href="http://twitter.com/foodiesarah">@foodiesarah</a> for lending me her laptop and rescuing me from projector problems</li>
<li>Thanks to <a href="http://twitter.com/perki">@perki</a> for the excellent TagWalk logo</li>
<li>Thanks to everyone at BarCamp Leeds and Geekup for being a friendly audience</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/06/data-mining-twitter-barcamp-leeds-talk/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>PHP Problem, Failed to Open Stream &#8211; File Too Large</title>
		<link>http://blog.tagwalk.com/2009/05/php-failed-to-open-stream-file-too-large/</link>
		<comments>http://blog.tagwalk.com/2009/05/php-failed-to-open-stream-file-too-large/#comments</comments>
		<pubDate>Tue, 19 May 2009 19:44:24 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[32-bit]]></category>
		<category><![CDATA[mono]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[sharding]]></category>
		<category><![CDATA[tasks]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=24</guid>
		<description><![CDATA[Tonight I am going to be modifying TagWalk to workaround a 32-bit file handling limitation in PHP. The fopen function creates an internal data structure to keep track of the file and the current read position. On a 32-bit platform, the default PHP build uses a signed integer as the file cursor so this gives [...]]]></description>
			<content:encoded><![CDATA[<p>Tonight I am going to be modifying TagWalk to workaround a 32-bit file handling limitation in PHP.</p>
<p>The fopen function creates an internal data structure to keep track of the file and the current read position. On a 32-bit platform, the default PHP build uses a signed integer as the file cursor so this gives a maximum addressable file size of 2.1GB. If you try to open a file greater than this limit, you&#8217;ll get this error:</p>
<blockquote><p>function.fopen: failed to open stream: File too large</p></blockquote>
<p>Apparently, there is a build option for PHP which can be used to compile PHP with a 64-bit integer in the file handling data structure, but I want to use standard &#8220;found anywhere&#8221; PHP rather than start tinkering. Using standard PHP makes it easier for me to move to new servers, either on Amazon EC2 or elsewhere.</p>
<p>Plus, this is a trigger point for me to implement some of the sharding functionality I had planned that would enable the scaling of TagWalk across muliple servers and the parallel processing of separate shards.</p>
<p>Tonight&#8217;s tasks include:</p>
<ul>
<li>Writing and running the data splitter to create the shards. This will be in the Mono backend tools.</li>
<li>Creating the PHP shard switching classes. This will require the shard lookup data required to resolve each tag to the appropriate shard. Ultimately this will live in memcache.</li>
</ul>
<p>If I can get all this done tonight &#8211; I&#8217;ll be very pleased.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/05/php-failed-to-open-stream-file-too-large/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>TagWalk first demo</title>
		<link>http://blog.tagwalk.com/2009/04/greetings/</link>
		<comments>http://blog.tagwalk.com/2009/04/greetings/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 21:48:12 +0000</pubDate>
		<dc:creator>Tim Hastings</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[GeekUp]]></category>

		<guid isPermaLink="false">http://blog.tagwalk.com/?p=1</guid>
		<description><![CDATA[Hello. My name is Tim Hastings (@timhastings), and welcome to the TagWalk blog. A blog intended to chronicle the developments of http://tagwalk.com/ TagWalk has been a side project of mine which I started around February 2009. On Monday night (20th April), I demonstrated progress-so-far to a small and friendly crowd at Geekup Preston. It was [...]]]></description>
			<content:encoded><![CDATA[<p>Hello. My name is Tim Hastings (<a title="Tim Hastings on Twitter" href="http://twitter.com/timhastings">@timhastings</a>), and welcome to the TagWalk blog. A blog intended to chronicle the developments of <a title="TagWalk - taking a sneeky peek into Twitter" href="http://tagwalk.com/">http://tagwalk.com/</a></p>
<p>TagWalk has been a side project of mine which I started around February 2009. On Monday night (20th April), I demonstrated progress-so-far to a small and friendly crowd at <a title="Geekup" href="http://www.geekup.org/">Geekup</a> Preston. It was well received and I got lots of feedback and ideas.</p>
<p>It prompted lots of discussion around hashtags, which is a main focus of TagWalk. In particular, the ambiguity of hashtags and how different people will use different hashtags to classify the same thing which can make them a difficult mechanism for retrieval unless you have pre-arranged what tag should be used. This happens alot at conferences etc.</p>
<p>This blog&#8217;s purpose is to capture the progress of TagWalk and to provide some background information into what&#8217;s going on here.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tagwalk.com/2009/04/greetings/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

