<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>I am a Computer Science researcher  interested in data analysis, machine learning and artificial intelligence. I play with Python, Java, C++, MATLAB, and Ruby.</description><title>traims' blog</title><generator>Tumblr (3.0; @traims)</generator><link>http://traims.com/</link><item><title>Copula-based clustering of datasets with multivariate dependence structure: a brief overview and links</title><description>&lt;p&gt;I have just attended a talk on copula-based clustering algorithms, where work of &lt;a href="I%20have%20just%20attended%20a%20talk%20on%20copula-based%20clustering%20algorithms,%20where%20work%20of%20%20Francesca%20Marta%20Lilja%20Di%20Lascio%20and%20%20Simone%20Giannerini%20was%20presented.%20%20%20%20http://stat.unibo.it/dilascio/%20%20%20Copula-based%20clustering%20is%20particularly%20tailored%20for%20datasets%20with%20high%20dependence%20between%20different%20clusters.%20Its%20performance%20degrades%20when%20dependence%20is%20lower.%20%20%20This%20is%20a%20kind%20of%20model-based%20clustering%20where%20each%20cluster%20is%20modeled%20as%20a%20set%20of%20realizations%20of%20one%20random%20variable.%20%20To%20model%20k%20clusters,%20k-dimensional%20copulas%20are%20used.%20%20http://mathworld.wolfram.com/Copula.html%20%20%20Copula%20Functions%20in%20Model%20Based%20Clustering%20are%20an%20etsablished%20research%20topic;%20%20%20-------------------------------%20%20R%20package%20http://rm.mirror.garr.it/mirrors/CRAN/web/packages/CoClust/index.html%20%20%20%20%20CoClust%20implements%20a%20new%20clustering%20method%20based%20on%20copula%20functions%20(see%20Publications).The%20R%20package%20CoClust%20is%20available%20on%20CRAN%20at%20this%20web%20page.%20Authors:%20Francesca%20Marta%20Lilja%20Di%20Lascio%20and%20Simone%20Giannerini" target="_blank"&gt;Francesca Marta Lilja Di Lascio&lt;/a&gt; and &lt;a href="http://www2.stat.unibo.it/giannerini/default_en.htm" target="_blank"&gt;Simone Giannerini&lt;/a&gt; was presented. &lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;strong&gt;Copula-based clustering&lt;/strong&gt; &lt;/span&gt;&lt;span&gt;is a kind of model-based clustering where each cluster is modeled as a set of realizations of one random variable. To model k clusters, k-dimensional &lt;/span&gt;&lt;a href="http://mathworld.wolfram.com/Copula.html" target="_blank"&gt;copulas&lt;/a&gt;&lt;span&gt; are used. &lt;/span&gt;Different kinds of copulas can be used for modelling the data; it’s up to the researcher to decide which ones to choose.&lt;/p&gt;
&lt;p&gt;The method is particularly useful for datasets with a high dependence between different clusters. Its performance degrades when dependence is lower. &lt;/p&gt;
&lt;p&gt;&lt;span&gt;The &lt;/span&gt;&lt;strong&gt;R package&lt;/strong&gt;&lt;span&gt; CoClust, which implements a new clustering method based on copulas, is available on CRAN: &lt;/span&gt;&lt;a href="http://cran.r-project.org/web/packages/CoClust/" target="_blank"&gt;&lt;a href="http://cran.r-project.org/web/packages/CoClust/" target="_blank"&gt;http://cran.r-project.org/web/packages/CoClust/&lt;/a&gt;&lt;/a&gt; (&lt;a href="http://www2.stat.unibo.it/giannerini/CoClust/CoClust_help.htm" target="_blank"&gt;R Help&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;span&gt;:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;&lt;span&gt;F. Marta L. Di Lascio, Simone Giannerini. A Copula-Based Algorithm for Discovering Patterns of Dependent Observations. &lt;em&gt;Journal of Classification.&lt;/em&gt; April 2012, Volume 29, Issue 1, pp 50-75 &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Di Lascio, F. M. L. (2008). Analyzing the dependence structure of microarray data: A copula–based approach.&lt;em&gt; Ph.D. Dissertation&lt;/em&gt;, University of Bologna, Bologna, Italy. Available at &lt;a href="http://amsdottorato.cib.unibo.it/670/" target="_blank"&gt;&lt;a href="http://amsdottorato.cib.unibo.it/670/" target="_blank"&gt;http://amsdottorato.cib.unibo.it/670/&lt;/a&gt;&lt;/a&gt;  &lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;</description><link>http://traims.com/post/40696452321</link><guid>http://traims.com/post/40696452321</guid><pubDate>Wed, 16 Jan 2013 20:18:00 +0100</pubDate><category>statistics</category><category>data analysis</category><category>clustering</category><category>rstats</category></item><item><title>Python for Data Analysis, 18 Oct 2012, London</title><description>&lt;p&gt;I have attended &amp;#8220;Python for Data Analysis&amp;#8221; meeting organised by Data Science London. There were two main talks — by Didrik Pinte from Enthought and by Wes McKinney, creator of &lt;a href="http://pandas.pydata.org/" target="_blank"&gt;pandas&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;h2&gt;NumPy, the Python foundation for number crunching&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;by Didrik Pinte &lt;/strong&gt;&lt;a class="twitter-atreply pretty-link" href="https://twitter.com/dpinte" target="_blank"&gt;@&lt;strong&gt;dpinte&lt;/strong&gt;&lt;/a&gt; — &lt;em&gt;Python contributor to &lt;a href="http://quantlib.org/index.shtml" target="_blank"&gt;QuantLib&lt;/a&gt; (a library for quant finance), and MD of Enthought, developer of &lt;a href="http://www.enthought.com/products/epd.php" target="_blank"&gt;EPD-the scientific computing Python platform&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;p&gt;&lt;a href="http://pics.lockerz.com/s/254145178" target="_blank"&gt;&lt;img alt="image" src="http://media.tumblr.com/27ac1c45441347f534fb52fefacef11e/tumblr_inline_mfar5hH9aM1rre7mr.jpg"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="stream-item-header"&gt;(c) by Data Science London &lt;a class="stream-item-header" href="https://twitter.com/ds_ldn" target="_blank"&gt;‏&lt;span class="username js-action-profile-name"&gt;@&lt;strong&gt;ds_ldn&lt;/strong&gt;&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;p&gt;About a half of the audience has already used NumPy, though I think only a couple of people has gone deep with C integration and memory optimizations. So it was a mix of an introductory talk with showing Cython code and profiling tools. &lt;/p&gt;
&lt;p&gt;Interestingly, when someone decided to port NumPy to .NET, it didn&amp;#8217;t work efficiently because of unpredictable garbage collection in .NET.&lt;/p&gt;
&lt;p&gt;Didrik has also shown how a memory monitor from &lt;a href="https://github.com/sjagoe/pikos" target="_blank"&gt;Pikos&lt;/a&gt; works.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;h2&gt;Python for Data Analysis&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;by Wes McKinney&lt;/strong&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;&lt;strong&gt;&lt;a href="https://twitter.com/wesmckinn" target="_blank"&gt;@wesmckinn&lt;/a&gt;&lt;/strong&gt; — &lt;em&gt;former quant, author of &lt;/em&gt;&lt;/span&gt;&lt;em&gt;&lt;a href="http://pandas.pydata.org/" target="_blank"&gt;pandas (the powerful Python library for data analysis&lt;/a&gt;), author of the book: &amp;#8220;&lt;a href="http://shop.oreilly.com/product/0636920023784.do" target="_blank"&gt;Python for Data Analysis&lt;/a&gt;&amp;#8221;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://pics.lockerz.com/s/254157104" target="_blank"&gt;&lt;img src="http://media.tumblr.com/ea7d8453b46a77634d728c6c16d9f1d3/tumblr_inline_mfarn6rzPi1rre7mr.jpg"/&gt;&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;div class="stream-item-header"&gt;(c) by Data Science London &lt;a class="account-group js-account-group js-action-profile js-user-profile-link js-nav" href="https://twitter.com/ds_ldn" target="_blank"&gt;‏&lt;span class="username js-action-profile-name"&gt;@&lt;strong&gt;ds_ldn&lt;/strong&gt;&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;p&gt;Most of the talk was done in the ipython notebook. Using a &lt;a href="http://www.grouplens.org/node/73" target="_blank"&gt;MovieLens dataset&lt;/a&gt; as an example, Wes has shown different pandas functions: data slicing, merge, map etc. The library is also good for data munging/cleaning/preparation.&lt;/p&gt;
&lt;p&gt;He told they are doing further improvements of the library because of use cases when people try to open a 5&amp;#160;GB Kaggle dataset and the system uses 20&amp;#160;GB of memory.&lt;/p&gt;
&lt;p&gt;Rmagic library: running R code in Python. Useful e.g. for ggplot2 library, which has no matches in the Python world. &lt;/p&gt;
&lt;p&gt;&lt;a href="http://shop.oreilly.com/product/0636920023784.do" target="_blank"&gt;&amp;#8220;Python for Data Analysis&amp;#8221; book&lt;/a&gt; is an introduction to pandas with working code examples, a better learning material than plain documentation. Print copies are not available yet; books will probably appear for Strata New York. &lt;em&gt;(I have just checked my O&amp;#8217;Reilly account, my copy is not listed as an &amp;#8220;early release&amp;#8221; anymore).&lt;/em&gt; &lt;/p&gt;
&lt;p&gt;The first speaker, Didrik, added two reasons to use pandas:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;its excellent documentation; &lt;/li&gt;
&lt;li&gt;&lt;a href="http://github.com/pydata/vbench" target="_blank"&gt;vbench&lt;/a&gt;, performance benchmarking for pandas.&lt;/li&gt;
&lt;/ol&gt;&lt;div&gt;Random notes:&lt;/div&gt;
&lt;ul&gt;&lt;li&gt;Carlos, the event host, offered three O&amp;#8217;Reilly books as quiz prizes. Didrik came up with a question about ndarray strides. Wes asked who has created the J language, which inspired NumPy. &lt;/li&gt;
&lt;li&gt;Didrik was using Canopy on his Mac. Wes was using ipython notebook.&lt;/li&gt;
&lt;li&gt;Meetup page of the event: &lt;a href="http://www.meetup.com/Data-Science-London/events/85448442/" target="_blank"&gt;&lt;a href="http://www.meetup.com/Data-Science-London/events/85448442/" target="_blank"&gt;http://www.meetup.com/Data-Science-London/events/85448442/&lt;/a&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://traims.com/post/33883718232</link><guid>http://traims.com/post/33883718232</guid><pubDate>Fri, 19 Oct 2012 09:13:00 +0200</pubDate><category>data analysis</category><category>python</category><category>london</category><category>numpy</category><category>pandas</category><category>data science</category><category>my notes</category></item><item><title>OKFN and the European Journalism Centre</title><description>&lt;p&gt;&lt;strong&gt;Open Interests Europe Hackathon&lt;/strong&gt; will take place in London, November 24 and 25. &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Open Interests Europe brings together developers, designers, activists, journalists and other geeks for two days of learning, fun, intense hacking and app building.&lt;/p&gt;
&lt;p&gt;How EU money is spent is an issue that concerns everyone who pays taxes to the EU. As the influence of Brussels lobbyists grows, it is increasingly important to draw the connections between lobbying, policy-making and funding. Journalists and activists need browsable databases, tools and platforms to investigate lobbyists’ influence and where the money goes in the EU. Join us and help build these tools!&lt;/p&gt;
&lt;p&gt;The Hackathon challenges include Lobbying Transparency and Fish Subsidies.&lt;/p&gt;
&lt;p&gt;Organised by the Open Knowledge Foundation and the European Journalism Centre, sponsored by Knight-Mozilla OpenNews.&lt;/p&gt;
&lt;p&gt;For more details, please see the event website: &lt;br/&gt;&lt;a href="http://okfnlabs.org/events/hackdays/lobbying.html" title="http://okfnlabs.org/events/hackdays/lobbying.html" target="_blank"&gt;&lt;a href="http://okfnlabs.org/events/hackdays/lobbying.html" target="_blank"&gt;http://okfnlabs.org/events/hackdays/lobbying.html&lt;/a&gt;&lt;/a&gt;&lt;img alt="" class="brImage" src="http://img1.meetupstatic.com/img/clear.gif" width="0"/&gt;&lt;/p&gt;
&lt;p&gt;and register at &lt;a href="http://openinterests.eventbrite.com" title="http://openinterests.eventbrite.com" target="_blank"&gt;&lt;a href="http://openinterests.eventbrite.com" target="_blank"&gt;http://openinterests.eventbrite.com&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Related links:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.meetup.com/OpenKnowledgeFoundation/London-GB/815352/" target="_blank"&gt;Meetup page&lt;/a&gt; of the event.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://blog.okfn.org/2012/10/15/openinterests/" target="_blank"&gt;OKFN blog entry&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://datadrivenjournalism.net/%20" target="_blank"&gt;&amp;#8220;Data-Driven Journalism&amp;#8221;&lt;/a&gt;, a project by EJC/OKF.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://datajcrew.sudmediatika.it/?p=1241" target="_blank"&gt;Data Journalism Courses: a map in progress&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://lilianabounegru.org/" target="_blank"&gt;Website of Liliana Bounegru&lt;/a&gt; from the European Journalism Center. I met Liliana at O&amp;#8217;Reilly Strata London, where she did book signing (she&amp;#8217;s a co-author of the &lt;em&gt;Data Journalism Handbook&lt;/em&gt;) and moderated a panel discussion on data journalism.&lt;/li&gt;
&lt;li&gt;Twitter: #ddj&lt;/li&gt;
&lt;/ul&gt;</description><link>http://traims.com/post/33730184579</link><guid>http://traims.com/post/33730184579</guid><pubDate>Tue, 16 Oct 2012 23:53:18 +0200</pubDate></item><item><title>Yuri Suzuki: London Underground Circuit Map 


images © hitomi...</title><description>&lt;img src="http://24.media.tumblr.com/tumblr_mabt0tleLt1rz4uayo1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;div&gt;&lt;strong&gt;Yuri Suzuki: London Underground Circuit Map &lt;/strong&gt;
&lt;blockquote&gt;
&lt;div&gt;
&lt;p&gt;&lt;em&gt;images © &lt;a href="http://www.hitomiyoda.com/" title="" target="_blank"&gt;hitomi kai yoda&lt;/a&gt;&lt;/em&gt;&lt;br/&gt;&lt;br/&gt;Japanese designer &lt;a href="http://yurisuzuki.com/" title="" target="_blank"&gt;Yuri Suzuki&lt;/a&gt; has sent DesignBoom images of his ‘London underground circuit maps’ project, developed as part of the &lt;a href="http://designmuseum.org/exhibitions/2012/designers-in-residence-2012" title="" target="_blank"&gt;Designers in Residence program&lt;/a&gt; at the London Design Museum, on show until January 13th, 2013.&lt;/p&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;/div&gt;</description><link>http://traims.com/post/31511159227</link><guid>http://traims.com/post/31511159227</guid><pubDate>Fri, 14 Sep 2012 08:07:00 +0200</pubDate><category>london</category><category>map</category><category>visualization</category><category>design</category></item><item><title>Palo Alto looks to use open data to embrace ‘city as a platform’ -- O'Reilly Radar</title><description>&lt;blockquote&gt;
&lt;div&gt;
&lt;p&gt;&lt;span&gt;&amp;#8220;The city of Palo Alto in California joined over a dozen&lt;span class="Apple-converted-space"&gt; &lt;/span&gt;&lt;/span&gt;&lt;a href="http://www.data.gov/opendatasites" target="_blank"&gt;cities around the United States&lt;/a&gt;&lt;span&gt; and globe when it launched its own&lt;span class="Apple-converted-space"&gt; &lt;/span&gt;&lt;/span&gt;&lt;a href="http://paloalto.opendata.junar.com/" target="_blank"&gt;open data platform&lt;/a&gt;&lt;span&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The city initially published open datasets that include the 2010 census data, pavement condition, city tree locations, park locations, bicycle paths and hiking trails, creek water level, rainfall and utility data. Open data about Palo Alto budgets, campaign finance, government salaries, regulations, licensing, or performance — which would all offer more insight into traditional metrics for government accountability — were not part of this first release.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;The platform includes an application programming interface (API) which enables direct access through a RESTful interface to open government data published in a JSON format. &lt;/span&gt;&lt;span&gt;Datasets can also be embedded like YouTube videos.&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&amp;#8221;&lt;/p&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;span&gt; — &lt;a href="http://radar.oreilly.com/2012/08/palo-alto-looks-to-use-open-data-to-embrace-city-as-a-platform.html" target="_blank"&gt;Palo Alto looks to use open data to embrace ‘city as a platform’, O&amp;#8217;Reilly Radar&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Data samples&lt;/h2&gt;
&lt;p&gt;&lt;iframe frameborder="0" height="175" src="http://paloalto.opendata.junar.com/datastreams/embed/PALO-ALTO-SINGL-FAMIL-PARCE?header_row=0&amp;amp;fixed_column=0" title="Palo Alto Single Family Parcel Assessed Valuation" width="400"&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;&lt;iframe frameborder="0" height="175" src="http://paloalto.opendata.junar.com/datastreams/embed/PRESC-2012-FALL-CLASS-AND?header_row=0&amp;amp;fixed_column=0" title="Preschoolers - 2012 Fall Classes and Activities in Palo Alto" width="400"&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;Powered by &lt;a href="http://www.junar.com" title="Junar · Discovering Data" target="_blank"&gt;Junar&lt;/a&gt;&lt;/p&gt;</description><link>http://traims.com/post/28767957004</link><guid>http://traims.com/post/28767957004</guid><pubDate>Sun, 05 Aug 2012 17:29:00 +0200</pubDate><category>open data</category></item><item><title>CSV kit: a set of command line tools for handling CSV files</title><description>&lt;a href="http://csvkit.readthedocs.org/en/latest/index.html"&gt;CSV kit: a set of command line tools for handling CSV files&lt;/a&gt;: &lt;ul&gt;&lt;li&gt;&lt;strong&gt;csvkit&lt;/strong&gt; is to tabular data what the standard Unix text processing suite (grep, sed, cut, sort) is to text.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;csvkit&lt;/strong&gt; is designed to augment or supercede much of Python’s &lt;strong&gt;&lt;a class="reference external" href="http://docs.python.org/2.7/library/csv.html#csv" title="(in Python v2.7)" target="_blank"&gt;&lt;span class="pre"&gt;csv&lt;/span&gt;&lt;/a&gt;&lt;/strong&gt; module.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;csvkit&lt;/strong&gt; tutorial walks through processing and analyzing a real dataset from &lt;a class="reference external" href="http://data.gov" target="_blank"&gt;data.gov&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Repository: &lt;a class="reference external" href="https://github.com/onyxfish/csvkit" target="_blank"&gt;&lt;a href="https://github.com/onyxfish/csvkit" target="_blank"&gt;https://github.com/onyxfish/csvkit&lt;/a&gt;&lt;/a&gt;&lt;br/&gt;Documentation: &lt;a class="reference external" href="http://csvkit.rtfd.org/" target="_blank"&gt;&lt;a href="http://csvkit.rtfd.org/" target="_blank"&gt;http://csvkit.rtfd.org/&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;via &lt;a href="http://okfn.org/members/markw/" target="_blank"&gt;@markw&lt;/a&gt;&lt;/p&gt;</description><link>http://traims.com/post/28765449972</link><guid>http://traims.com/post/28765449972</guid><pubDate>Sun, 05 Aug 2012 16:25:00 +0200</pubDate><category>data analysis</category><category>csv</category></item><item><title>London: A Twisted Underground Map | Minefield Junction</title><description>&lt;a href="http://dansd.com/about-the-twisted-underground-map/"&gt;London: A Twisted Underground Map | Minefield Junction&lt;/a&gt;: &lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;It’s a rather strange version of Harry Beck’s London Underground map, in which I took the approach of “reflecting the topology, ignoring the geography” to the limit. &lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://traims.com/post/28755701836</link><guid>http://traims.com/post/28755701836</guid><pubDate>Sun, 05 Aug 2012 10:16:00 +0200</pubDate><category>london</category><category>map</category><category>design</category><category>visualization</category></item><item><title>EuroPython started today.</title><description>&lt;img src="http://25.media.tumblr.com/tumblr_m6jl2iQrP41rz4uayo2_500.jpg"/&gt;&lt;br/&gt; T-shirts and MongoDB mug&lt;br/&gt;&lt;br/&gt; &lt;img src="http://24.media.tumblr.com/tumblr_m6jl2iQrP41rz4uayo4_500.jpg"/&gt;&lt;br/&gt; Coffee break for 700+ people &lt;br/&gt;&lt;br/&gt; &lt;img src="http://24.media.tumblr.com/tumblr_m6jl2iQrP41rz4uayo1_500.jpg"/&gt;&lt;br/&gt; Guides and leaflets&lt;br/&gt;&lt;br/&gt; &lt;img src="http://24.media.tumblr.com/tumblr_m6jl2iQrP41rz4uayo3_r1_500.jpg"/&gt;&lt;br/&gt; Waiting for the opening&lt;br/&gt;&lt;br/&gt; &lt;p&gt;&lt;a href="https://ep2012.europython.eu/p3/live/" title="EuroPython 2012, Florence, July 2-8" target="_blank"&gt;EuroPython&lt;/a&gt; started today.&lt;/p&gt;</description><link>http://traims.com/post/26352910841</link><guid>http://traims.com/post/26352910841</guid><pubDate>Mon, 02 Jul 2012 18:42:00 +0200</pubDate><category>python</category><category>europython</category><category>mongodb</category><category>florence</category></item><item><title>UCR Insect Classification Contest</title><description>&lt;p&gt;&lt;a href="http://www.cs.ucr.edu/%7Eeamonn/CE/contest.htm" target="_blank"&gt;&lt;a href="http://www.cs.ucr.edu/~eamonn/CE/contest.htm" target="_blank"&gt;www.cs.ucr.edu/~eamonn/CE/contest.htm&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;#8220;Dr. Keoghs lab is hosting a contest to create a similarity measure for &lt;strong&gt;insect flight sounds&lt;/strong&gt; (there is anecdotal evidence that music similarity algorithms may work). &lt;/p&gt;
&lt;p&gt;If we could correctly identify insects using cheap sensors, we could:&lt;br/&gt; 1) Plan malaria interventions more effectively, perhaps saving some of the million human lives lost each year.&lt;br/&gt; 2) Plan insect pest crop interventions more effectively, thus growing more food with less pollution, less energy, less environmental damage etc.&amp;#8221;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phase I: July to November 16th 2012&lt;/strong&gt;&lt;br/&gt;• The task is to produce the best distance (similarity) measure for insect flight sounds.&lt;br/&gt;• The contest will be scored by 1-nearest neighbor classification.&lt;/p&gt;</description><link>http://traims.com/post/26271111249</link><guid>http://traims.com/post/26271111249</guid><pubDate>Sun, 01 Jul 2012 13:55:36 +0200</pubDate><category>distance</category><category>similarity</category><category>data analysis</category><category>contest</category><category>insects</category><category>music</category></item><item><title>Simeon Lobo: Useful Hadoop and MapReduce Algorithms</title><description>&lt;a href="http://sim.miadal.com/post/21572887920"&gt;Simeon Lobo: Useful Hadoop and MapReduce Algorithms&lt;/a&gt;: &lt;p&gt;&lt;a class="tumblr_blog" href="http://sim.miadal.com/post/21572887920" target="_blank"&gt;simeonlobo&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;

&lt;div&gt;I’ve chosen to reproduce the below list from &lt;a href="http://bit.ly/vzHrUt" target="_blank"&gt;Amund Tveit’s article&lt;/a&gt; so I can maintain a backed-up personal reference.&lt;br/&gt;&lt;br/&gt;I also intend to update this collection of Hadoop MapReduce algorithms based on my growing experience with the platform. &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;Artificial Intelligence / Machine Learning / Data…&lt;/strong&gt;&lt;/div&gt;
&lt;/blockquote&gt;</description><link>http://traims.com/post/25712952779</link><guid>http://traims.com/post/25712952779</guid><pubDate>Sat, 23 Jun 2012 15:03:41 +0200</pubDate></item><item><title>Add github gists to  your tumblr</title><description>&lt;p&gt;&lt;a class="tumblr_blog" href="http://mutatedmonkeygenes.tumblr.com/post/23675573067/https-gist-github-com-1395926" target="_blank"&gt;mutatedmonkeygenes&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div class="gist"&gt;&lt;a href="https://gist.github.com/1395926" target="_blank"&gt;&lt;a href="https://gist.github.com/1395926" target="_blank"&gt;https://gist.github.com/1395926&lt;/a&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/blockquote&gt;</description><link>http://traims.com/post/25712558664</link><guid>http://traims.com/post/25712558664</guid><pubDate>Sat, 23 Jun 2012 14:49:44 +0200</pubDate></item><item><title>Papers on massive data mining</title><description>&lt;p&gt;The page of the &lt;a href="https://sites.google.com/site/madamsresearch/home/summer-school-on-massive-data-mining" target="_blank"&gt;Summer School on Massive Data Mining&lt;/a&gt; (August 8-10, 2012, Denmark) lists several papers as recommended reading. Namely:&lt;/p&gt;
&lt;p&gt;Mining patterns from large datasets:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.cs.ucsb.edu/%7Exyan/papers/dmkd07_frequentpattern.pdf" rel="nofollow" target="_blank"&gt;Frequent pattern mining: current status and future directions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://adrem.ua.ac.be/%7Egoethals/software/survey.pdf" rel="nofollow" target="_blank"&gt;Survey on Frequent Pattern Mining&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Matrix and graph algorithms:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://arxiv.org/abs/1203.0786" rel="nofollow" target="_blank"&gt;Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://arxiv.org/abs/1010.1609" rel="nofollow" target="_blank"&gt;Algorithmic and Statistical Perspectives on Large-Scale Data Analysis&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Recommended reading for two more talks (mining large graphs, and clustering/metaclustering) is to be announced.&lt;/p&gt;</description><link>http://traims.com/post/25708319347</link><guid>http://traims.com/post/25708319347</guid><pubDate>Sat, 23 Jun 2012 11:47:58 +0200</pubDate><category>data mining</category><category>data analysis</category></item><item><title>Hack/Reduce: another kind of business incubator </title><description>&lt;p&gt;Thanks to a Twitter recommendation, I stumbled upon &lt;a href="http://www.hackreduce.org/" target="_new"&gt;Hack/Reduce&lt;/a&gt;, a non-profit initiative from Massachusetts.&lt;/p&gt;
&lt;blockquote&gt;Hack/reduce is a community hacker space in Boston. We provide a space where people working with Big Data can come to share infrastructure, resources and knowledge.&lt;/blockquote&gt;
&lt;p&gt;Seems quite interesting. From a broader point of view, this is yet another approach for creating a &amp;#8220;business incubator&amp;#8221;.&lt;/p&gt;
&lt;blockquote&gt;Hack/Reduce will develop the necessary talent to create companies and jobs to shape the future in the Big Data driven economy.
&lt;ul&gt;&lt;li&gt;It’s about hands-on learning: tools, techniques, best practices, etc.&lt;/li&gt;
&lt;li&gt;It’s also about connecting people that don’t normally connect: super geeks, business, artists; people that are passionate about content.&lt;/li&gt;
&lt;/ul&gt;
— &lt;em&gt;&lt;a href="http://bostinno.com/all-series/the-most-important-job-in-boston-a-qa-with-chris-lynch-on-hackreduce/" target="_new"&gt;“The most important job in Boston”: A Q&amp;amp;A with Chris Lynch on Hack/Reduce&lt;/a&gt;&lt;/em&gt;&lt;/blockquote&gt;</description><link>http://traims.com/post/25438841642</link><guid>http://traims.com/post/25438841642</guid><pubDate>Tue, 19 Jun 2012 18:19:00 +0200</pubDate><category>community</category><category>data analysis</category><category>non profits</category></item><item><title>NodeXL visualizations: conference Twitter posts and GitHub data</title><description>&lt;p&gt;&lt;a href="http://nodexl.codeplex.com/" target="_blank"&gt;NodeXL&lt;/a&gt; is a tool for working with &lt;strong&gt;network graphs&lt;/strong&gt;, developed as a free open-source template for Microsoft Excel. Examples of visualizations created by the community are stored in the &lt;a href="http://www.nodexlgraphgallery.org/" target="_blank"&gt;NodeXL Graph Gallery&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I found these two visualizations from the library particularly interesting:&lt;/p&gt;
&lt;h2&gt;Collective Intelligence Conference&lt;/h2&gt;
&lt;p&gt;A network graph based on Twitter data &lt;a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=597" target="_blank"&gt;created by Ben Shneiderman&lt;/a&gt;. Apparently, it shows links between Twitter accounts of people who posted about the Collective Intelligence conference. Clustered by the Clauset-Newman-Moore algorithm.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=597" target="_blank"&gt;&lt;img alt="" class="alignnone size-full" src="http://24.media.tumblr.com/tumblr_m5vio9GVLD1rz4uayo1_500.png" title="NodeXL-Twitter"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=597" target="_blank"&gt;&lt;small&gt;[link to the gallery entry]&lt;/small&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Related: &lt;a href="http://scholarship.law.georgetown.edu/digitalpreservation_publications/5/" target="_blank"&gt;Anatomy of a Conference Twitter Hashtag: #AALL2010&lt;/a&gt; (PDF + data)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://t.co/bH3SSCb4" target="_blank"&gt;#jisccdd Hashtag Analysis in Google Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;h2&gt;GitHub: organizations and programming languages&lt;/h2&gt;
&lt;p&gt;The next visualization &lt;a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=621" target="_blank"&gt;created by Eduarda Mendes Rodrigues&lt;/a&gt; shows links between GitHub organizations and the programming languages. &amp;#8220;Each link between an organisation and a programming language indicates that the organization has at least 250 projects written in that language.&amp;#8221;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=621" target="_blank"&gt;&lt;img alt="" class="alignnone size-full" src="http://25.media.tumblr.com/tumblr_m5vhm6dcke1rz4uayo1_500.png" title="NodeXL-GitHub"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=621" target="_blank"&gt;&lt;small&gt;[link to the gallery entry]&lt;/small&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Results are sometimes unexpected, at least in the terms of visialization. CoffeeScript turned out to be much closer to PHP than to Ruby. Prolog appears near Java (probably because of Java libraries for logic programming?). Assembly landed somewhere between C and Ruby. Anyway, &amp;#8220;at least 250 projects&amp;#8221; condition may appear a little bit artificial: only large organizations are considered.&lt;/p&gt;</description><link>http://traims.com/post/25432640865</link><guid>http://traims.com/post/25432640865</guid><pubDate>Tue, 19 Jun 2012 15:49:00 +0200</pubDate><category>Data analysis</category><category>Visualization</category><category>Tools</category></item><item><title>1.USA.gov: public data on shortened links to .gov and .mil websites</title><description>&lt;p&gt;I am currently reading an early release of &lt;a href="http://shop.oreilly.com/product/0636920023784.do" target="_blank"&gt;&amp;#8220;Python for Data Analysis&amp;#8221;&lt;/a&gt; written by the creator of &lt;a href="http://pandas.pydata.org/" target="_blank"&gt;pandas&lt;/a&gt;. It is an unfinished version, so some of the chapters are missing, and there are a lot of TODOs around. Full version is planned for release in October 2012. Nevertheless, it is quite interesting to read even now.&lt;/p&gt;
&lt;p&gt;From this book, I found out about &lt;a href="http://www.usa.gov/About/developer-resources/1usagov.shtml" target="_blank"&gt;1.USA.gov&lt;/a&gt;, open data about URLs in &lt;code&gt;.gov&lt;/code&gt; and &lt;code&gt;.mil&lt;/code&gt; domains shortened by &lt;a href="http://bit.ly" target="_blank"&gt;bit.ly&lt;/a&gt;. They provide anonymized data created anytime anyone clicks a shortened link pointing to a &lt;code&gt;.gov&lt;/code&gt; or &lt;code&gt;.mil&lt;/code&gt; address.&lt;/p&gt;
&lt;p&gt;You can get this data from a live feed or download an hourly snapshot. A JSON entry looks like that:&lt;/p&gt;
&lt;p&gt;{ &amp;#8220;a&amp;#8221;: &amp;#8220;Mozilla\/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit\/534.46 (KHTML, like Gecko) Mobile\/9A405 Twitter for iPhone&amp;#8221;, &amp;#8220;c&amp;#8221;: &amp;#8220;CA&amp;#8221;, &amp;#8220;nk&amp;#8221;: 0, &amp;#8220;tz&amp;#8221;: &amp;#8220;America\/Winnipeg&amp;#8221;, &amp;#8220;gr&amp;#8221;: &amp;#8220;MB&amp;#8221;, &amp;#8220;g&amp;#8221;: &amp;#8220;JKZUHq&amp;#8221;, &amp;#8220;h&amp;#8221;: &amp;#8220;J8ZPYk&amp;#8221;, &amp;#8220;l&amp;#8221;: &amp;#8220;nasatwitter&amp;#8221;, &amp;#8220;al&amp;#8221;: &amp;#8220;en-us&amp;#8221;, &amp;#8220;hh&amp;#8221;: &amp;#8220;go.nasa.gov&amp;#8221;, &amp;#8220;r&amp;#8221;: &amp;#8220;https:\/\/twitter.com\/ nasa\/status\/204658186135932930&amp;#8221;, &amp;#8220;u&amp;#8221;: &amp;#8220;http:\/\/www.nasa.gov\/ mission_pages\/hinode\/eclipse_120520.html&amp;#8221;, &amp;#8220;t&amp;#8221;: 1337634445, &amp;#8220;hc&amp;#8221;: 1337629186, &amp;#8220;cy&amp;#8221;: &amp;#8220;Winnipeg&amp;#8221;, &amp;#8220;ll&amp;#8221;: [ 49.883301, -97.166702 ] }&lt;/p&gt;
&lt;p&gt;Further information can be found in &lt;a href="http://www.usa.gov/About/developer-resources/1usagov.shtml" target="_blank"&gt;USA.gov Developer Resources&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The last year they even conducted a Hack Day to encourage people to dig into this data. The most interesting results were &lt;a href="http://blog.bitly.com/post/7624585240/visualizing-the-nasa-shuttle-launch-with-public" target="_blank"&gt;NASA Shuttle Launch Visualization&lt;/a&gt; and the mere fact that 42% of all clicks link to NASA websites. People from Europe are more likely to visit NASA than other websites. Still, I wonder what drives the collection of all these data.&lt;/p&gt;</description><link>http://traims.com/post/25432350499</link><guid>http://traims.com/post/25432350499</guid><pubDate>Tue, 19 Jun 2012 15:40:00 +0200</pubDate><category>Data analysis</category></item><item><title>Crawling the Web to Keep It Open</title><description>&lt;p&gt;&lt;strong&gt;Speaker: Lisa Green (&lt;a href="http://commoncrawl.org/" target="_blank"&gt;Common Crawl&lt;/a&gt;)&lt;/strong&gt;Common Crawl builds and mantains an open crawl of the web that anyone can use. A lot of possible applications in research and business.&lt;/p&gt;
&lt;blockquote&gt;&amp;#8220;Common Crawl aims to change the big data game with our repository of over 40 terabytes of high-quality web crawl information into the Amazon cloud, the net total of 5 billion crawled pages&amp;#8221;.&lt;/blockquote&gt;
&lt;p&gt;Datasets are updated with a high frequency. Additionally: tutorials, YouTube videos, GitHub presence.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://commoncrawl.org/mapreduce-for-the-masses/trackback/" target="_blank"&gt;&amp;#8220;Map Reduce for the Masses: Zero to Hadoop in Five Minutes with Common Crawl&amp;#8221;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&amp;#8220;&lt;a href="http://commoncrawl.org/twelve-steps-to-running-your-ruby-code-across-five-billion-web-pages/trackback/" target="_blank"&gt;Twelve steps to running your Ruby code across five billion web pages&amp;#8221;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;The foundation is particularly interested in working with academia. &lt;a href="http://webdatacommons.org/" target="_blank"&gt;Web Data Commons&lt;/a&gt; is a collaborative project with two German universities.&lt;/p&gt;</description><link>http://traims.com/post/25438489699</link><guid>http://traims.com/post/25438489699</guid><pubDate>Thu, 24 May 2012 18:12:00 +0200</pubDate></item><item><title>Liberating Data @ Wikipedia</title><description>&lt;p&gt;&lt;strong&gt;Speaker: Diederik van Liere (Wikimedia Foundation)&lt;/strong&gt; Ecosystem built around Wikipedia is huge. &amp;#8220;Open, share, attribute&amp;#8221; is in Wikimedia&amp;#8217;s DNA. But they can make a better job with data: releasing more datasets and developing tools and APIs so that people could use it. Wikimedia is a small foundation, they need to crowdsource. Data should be more machine-readable. &amp;#8220;All capitals of all countries in the world&amp;#8221;: this question can be answered by DBPedia, not Wikipedia. &lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="https://github.com/whym/wikihadoop" target="_blank"&gt;Wikihadoop&lt;/a&gt; is a tool for processing compressed XML dumps of the Wikimedia projects. Output is the difference between two revisions: what added? what removed?&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/whym/RevDiffSearch" target="_blank"&gt;RevDiffSearch&lt;/a&gt; is a Lucene-based full-text search engine. Can answer questions such as &amp;#8220;What authors are adding geo-templates? Who is adding citations?&amp;#8221;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;KRAKEN &lt;/strong&gt;is the Wikimedia analytics system. Development in progress. Hadoop, Cassandra, Hive, Zookeeper.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://reportcard.wmflabs.org/" target="_blank"&gt;Wikimedia Report Card&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Datasets:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://dumps.wikimedia.org/" target="_blank"&gt;Wikimedia Dumps&lt;/a&gt;: pageviews, surveys, XML dumps.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.datahub.io/dataset?q=wikimedia" target="_blank"&gt;&lt;a href="http://www.datahub.io/dataset?q=wikimedia" target="_blank"&gt;http://www.datahub.io/dataset?q=wikimedia&lt;/a&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://traims.com/post/25438524298</link><guid>http://traims.com/post/25438524298</guid><pubDate>Thu, 24 May 2012 18:12:00 +0200</pubDate></item><item><title>Data Philanthropy: Case Studies</title><description>&lt;p&gt;&lt;strong&gt;Hosted by: Jake Porway (&lt;a href="http://datakind.org/" target="_blank"&gt;DataKind&lt;/a&gt;)&lt;/strong&gt;DataKind organizes weekend projects, hackatons, nightime projects to support NGOs. Why it works:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;For NGOs: better decision support. Small teams cannot afford to hire a data scientist.&lt;/li&gt;
&lt;li&gt;For software engineers: helping people to make world a better place is more exciting than coding typical hackaton apps (such as parking apps).&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Three case studies with three different organizations:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Community Knowledge Worker initiative&lt;/strong&gt; Speaker: Emily Tucker (&lt;a href="http://www.grameenfoundation.org/" target="_blank"&gt;Grameen Foundation&lt;/a&gt;) Farmers in poor rural areas have no basic information about market prices or weather forecasts. Grameen Foundation recruits people directly from the community, and they &lt;a href="http://grameenfoundation.wordpress.com/2012/04/03/power-for-ckws-in-uganda/" target="_blank"&gt;are trained to use a smartphone&lt;/a&gt;. Then &amp;#8220;knowledge workers&amp;#8221; &lt;a href="http://www.grameenfoundation.applab.org/blog/stories-from-the-field-what-do-our-community-knowledge-workers-do.html" target="_blank"&gt;literally go from farmer to farmer&lt;/a&gt; to share information. Grameen collects a lot of data: GPS location, demographic information on participants, size of their farms. This data can help to estimate probabilities (what is more likely for families with 11 children? Are bicycles too expensive?) As they cannot afford a data scientist, they &lt;a href="http://datakind.org/2011/12/grameen-foundation/" target="_blank"&gt;work with DataKind&lt;/a&gt; to identify very specific questions to support concrete decisions about how to improve things. Even a weekend project can have a real impact. &lt;small&gt;Related: &lt;a href="http://www.slideshare.net/hyperiondevelopment/using-data-to-drive-decision-making" target="_blank"&gt;Slides: Using Data to Drive Decision Making, March 15, 2012. Emily Tucker, Grameen Foundation&lt;/a&gt; &lt;/small&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Helping children to reach their full potential&lt;/strong&gt; Speaker: HyeSook Chung HyeSook Chung (&lt;a href="http://www.dcactionforchildren.org/content/about-us" target="_blank"&gt;DC Action for Children&lt;/a&gt;) &amp;#8220;In the areas of concentrated poverty, traditional way of thinking will not take us too far. We need untraditional approach&amp;#8221;. Their choice is data-driven approach. Depending on where a child lives, we can predict how it will succeed in life. This can push the city to think about allocation of resources. DataKind and their volunteers created &lt;a href="http://radar.oreilly.com/2012/03/visualization-washington-dc-schools-neighborhoods-kids.html" target="_blank"&gt;a prototype visualization of conditions that exist in different neighbourhoods&lt;/a&gt; in Washington, D.C. It shows visually on the map childcare facilities, libraries and other relevant information that helps to answer the question: is this neighbourhood friendly for raising children?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Identifying high-performing non-profits&lt;/strong&gt; Speaker: Braz Brandt (&lt;a href="http://www.guidestar.org/" target="_blank"&gt;GuideStar USA&lt;/a&gt;) GuideStar gathers and publishes information on non-profits. There are a lot of data on non-profits, and a goal is to make it available online. It is possible to build predictive models for non-profits based on the data collected on them. The ultimate goal is to identify high performing non profits. &amp;#8220;Are they able to use the money effectively? Are they likely to fail?&amp;#8221; If we support the best organizations, they are more likely to make changes happen in the world.&lt;/li&gt;
&lt;/ol&gt;</description><link>http://traims.com/post/25438405114</link><guid>http://traims.com/post/25438405114</guid><pubDate>Thu, 24 May 2012 18:10:00 +0200</pubDate><category>community</category><category>data analysis</category><category>non profits</category></item><item><title>Open Knowledge Foundation: Open Tools for Open Data</title><description>&lt;p&gt;&lt;strong&gt;Speaker: Rufus Pollock (&lt;a href="http://okfn.org/" target="_blank"&gt;Open Knowledge Foundation&lt;/a&gt;) &lt;/strong&gt; The Open Knowledge Foundation is a not-for-profit founded in 2004 and particularly active in Europe. They focus on open data. Problem-oriented approach: using digital information more effectively to improve governance, research and economy; to build more sustainable cities. &lt;strong&gt;Challenges&lt;/strong&gt;: tools and datasets are diffucult to find and use. Datasets don&amp;#8217;t benefit from a larger community engagement. They &lt;strong&gt;developed their own tools&lt;/strong&gt; to address these challenges. &lt;strong&gt;Tools&lt;/strong&gt;: &lt;a href="http://ckan.org/" target="_blank"&gt;Comprehensive Knowledge Archive Network (CKAN) Software&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;CKAN is a &lt;strong&gt;content management system&lt;/strong&gt; that makes it easy to publish, find and use data.&lt;/li&gt;
&lt;li&gt;Free open source software; you can download it for free and host it on your servers.&lt;/li&gt;
&lt;li&gt;Source code is available in &lt;a href="http://pypi.python.org/pypi/ckan/1.4.3" target="_blank"&gt;Python package index&lt;/a&gt;. Documentation is &lt;a href="http://readthedocs.org/projects/ckan/" target="_blank"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Extensible and flexible; add-ons for stats, analytics, data APIs etc.&lt;/li&gt;
&lt;li&gt;Dozens of CKAN deployments around the world: &lt;a href="http://data.gov.uk/" title="data.gov.uk" target="_blank"&gt;data.gov.uk&lt;/a&gt;, DataHub.io &amp;#8230;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;a href="http://datahub.io/" target="_blank"&gt;DataHub.io&lt;/a&gt; is &lt;strong&gt;community data hub&lt;/strong&gt;powered by CKAN. Like GitHub is for code, DataHub is for data.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;publish a dataset: upload a file (or use CKAN API&amp;#8230;)&lt;/li&gt;
&lt;li&gt;search for datasets&lt;/li&gt;
&lt;li&gt;visualize: charts, geomaps&amp;#8230; (search by region)&lt;/li&gt;
&lt;li&gt;data API for external data access (documentation available at &lt;a href="http://docs.ckan.org" target="_blank"&gt;docs.ckan.org&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;a href="http://reclinejs.com/" target="_blank"&gt;Recline.js&lt;/a&gt;is a stand-alone JavaScript library for data vizualization. Built with Backbone.js.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Data Explorer: &amp;#8220;combines a data grid, Google Refine-style data transforms and visualizations.&amp;#8221;&lt;/li&gt;
&lt;li&gt;A library of data components: &amp;#8220;data grid, graphing, and data connectors.&amp;#8221;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Other projects:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://openspending.org/" target="_blank"&gt;OpenSpending.org&lt;/a&gt;: financial information from different countries and sources.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://wheredoesmymoneygo.org/" target="_blank"&gt;Where Does My Money Go (UK)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://blog.okfn.org/2012/05/21/kick-starting-the-school-of-data/" target="_blank"&gt;School of Data&lt;/a&gt;: community-oriented project; kick-off meeting May 24 (in Berlin and online).
&lt;blockquote&gt;The School will provide online training for data ‘wrangling’ skills – the ability to find, retrieve, clean, manipulate, analyze, and represent different types of data. Everyone is welcome, and we will try to have a range of hands-on activities to suit everyone’s interests.&lt;/blockquote&gt;
&lt;small&gt;&lt;a href="http://schoolofdata.org/frequently-asked-questions/" target="_blank"&gt;Frequently Asked Questions about School of Data&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://traims.com/post/25432819731</link><guid>http://traims.com/post/25432819731</guid><pubDate>Thu, 24 May 2012 15:55:00 +0200</pubDate><category>data analysis</category><category>open data</category><category>non-profits</category><category>tools</category><category>visualization</category></item><item><title>Narrative Science: Transforming data into stories</title><description>&lt;p&gt;&lt;strong&gt;Strata Online Conference, &lt;/strong&gt;&lt;strong&gt;Speaker: Kristian Hammond (&lt;a href="http://www.narrativescience.com/" target="_blank"&gt;Narrative Science&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Narrative Science&lt;/em&gt; transforms data into narrative content in plain English at an enormous scale. &lt;a href="http://www.narrativescience.com/?page_id=10" target="_blank"&gt;Here&lt;/a&gt; is a sketch of how it works. Right now &lt;em&gt;Narrative Science&lt;/em&gt; works with media companies and data companies; looking to move on government/public policy data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tools: &lt;/strong&gt; Python, MongoDB, MapReduce.&lt;/p&gt;
&lt;p&gt;So far, their toolset is for internal, not external use.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;Media coverage of their business: &lt;a href="http://www.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1" target="_blank"&gt;Wired&lt;/a&gt;, &lt;a href="http://radar.oreilly.com/2012/01/narrative-science-kristian-hammond-data-content-generation.html" target="_blank"&gt;O&amp;#8217;Reilly Radar&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;</description><link>http://traims.com/post/25432739520</link><guid>http://traims.com/post/25432739520</guid><pubDate>Thu, 24 May 2012 15:52:00 +0200</pubDate><category>strataconf</category></item></channel></rss>
