Personal thoughts about technical stuff I encountered during my journey. You can also select a specific tag to read all post related to it.

2020

Italy Coronavirus Outbreak: numbers and stats 2020/02/24

Italy NCOV-19 outbreak number of confirmed cases and deaths in Italy daily change in confirmed cases in Italy daily confirmed cases in Italy daily change in confirmed cases in Lombardy comparison between Lombardy and Hubei province daily new cases ICU numbers Italy NCOV-19 outbreak For personal reasons I am trying to track the number of NCOV-19 confirmed cases in Italy as well as the number of deaths (since I live in Italy, it is not difficult to guess the personal reason…).

tags: ///

2019

Introducing the paletteR Gallery 2019/11/26

celebrating beauty add your own masterpiece discover more about paletteR celebrating beauty PaletteR, the package that allows you to create an optimized palette from an image, has been staying around for nearly two years, and #rstats users have made a lot of great stuff with it. I had therefore took the time to collect what I have found around the web, just to celebrate this beauty. You can find them below in a slideshow.

tags: ///

a quick ride on pagedown: create PDFs from Rmarkdown 2019/01/27

Do we really need a package like pagedown? installing pagedown selecting the preferred template filling in and rendering the template finally obtaining the desired PDF file my suggestions for improvements I recently came across the good talk by the always good Yiuhi Xie at the Rstudio conference about Pagedown. You can see it by yourself reaching out the rstudio website or clicking the image below:

tags: ///

2018

introducing vizscorer: a bot advisor to improve your ggplot plots 2018/11/12

introducing vizscorer: a bot advisor to improve your ggplot plots How to measure a good plot? Preparing a training dataset of plots How to train Machine Learning to recognize a good plot? Can Machine learning talk back to humans? Putting all together: vizscorer and the scorer_bot Where to go from here and how to help introducing vizscorer: a bot advisor to improve your ggplot plots One of the most frustrating issues I face in my professional life is the plentitude of ineffective reports generated within my company.

tags: /////

getting to know the new definition of default 2018/07/13

The greatest part of my time at working is spent looking at those great piece of statistical machinery that credit risk models are. That is why I was recently required to prepare a short course about the new definition of default, which is goign to be applied from 01/01/2021 on. It is quite a unexplored topic in the sector and I haven’t found a lot of teaching material online, that is why I tought to share here the brief deck of slide I produced for the occasion.

tags: /

how to use PaletteR to automagically build palettes from pictures 2018/05/08

Introducing paletter Installing paletter Creating a palette from your image Functional specification Reading a picture into the RGB colourspace Processing the RGB image trough kmeans Moving to the hsv colours space Removing outliers Optimising palette How to apply paletteR in ggplot2 Join us I live in Italy, and more precisely in Milan, a city known for fashion and design events. During a lunch break I was visiting the Pinacoteca di Brera, a 200 centuries old museum.

tags: ///////

UpdateR package: update R version with a function (on MAC OSX) 2018/03/10

Mac version of updateR function: the UpdateR package how to install the updateR package how to update R version using the updateR package behind the scenes: how updateR works verify that user is running a unix machine get last R version from CRAN run command line commands within R accomplisments and further developments feel free to complain with me I personally really appreciate the InstallR package from Tal galilli, since it lets you install a great number of tools needed for working with R just running a function.

tags: ////

Learning Dataviz Principles and Theory from Tufte 2018/02/10

theory of data graphics maximise data-ink ratio, within reason maximise data density and the size of the data matrix, within reason treat graphics as paragraphs and shape them appropriately integrity of data graphics always show data in their context try to produce a small lie factor show data variation, not design variation use as many dimensions as the number of dimensions in your data how to apply Tufte’s principles in R I have recently completed a great reading: Edward Tufte’s The visual display of quantitative information.

tags: /

2017

Why are you being so silent? 2017/04/12

It has being nearly half an year since the last post about workflower was out, why did I stay so silent for that long? I have three major updates to explain the silence: 📚 Good guys atPackt publishing asked me to write one more book about R and data mining, I suppose this is because the first one was well received 📦 I spend my spare time working on updateR so to get it ready to go on CRAN.

tags: //

2016

streamline your analyses linking R to sas and more: the workfloweR 2016/09/21

we all know R is the first choice for statistical analysis and data visualisation, but what about big data munging? tidyverse (or we’d better say hadleyverse 😏) has been doing a lot in this field, nevertheless it is often the case this kind of activities being handled from some other coding language. Moreover, sometimes you get as an input pieces of analyses performed with other kind of languages or, what is worst, piece of databases packed in proprietary format (like .

tags: ///////

ggplot2 themes examples 2016/08/09

this short post is exactly what it seems: a showcase of all ggplot2 themes available within the ggplot2 package. I was doing such a list for myself ( you know that feeling …“how would it look like with this theme? let’s try this one…”) and at the end I thought it could have be useful for my readers. At least this post will save you the time of trying all differents themes just to have a sense of how they look like.

tags: ///////////

Euro 2016 analytics: Who's playing the toughest game? 2016/06/21

I am really enjoying Uefa Euro 2016 Footbal Competition, even because our national team has done pretty well so far. That’s why after browsing for a while statistics section of official EURO 2016 website I decided to do some analysis on the data they share ( as at the 21th of June). Just to be clear from the beginning: we are not talking of anything too rigourus, but just about some interesting questions with related answers gathered mainly through data visualisation.

tags: ////////

a Checklist for your weekly review (GTD methodology) 2016/05/17

I was crafting this checklist for my personal use, and then I found myself thinking: why should’nt I share this useful handful of bullets with my readers? So here we are, find below an useful checklist for your weekly review. The checklist is derived directly from the official GTD book by our great friend David Allen. The greatest quality of the checklist is the minimalist approach: just what you really need to read is written within each point, so that you get through your review as quick as possible.

tags: ///

Over 50 practical recipes for data analysis with R in one book 2016/05/11

Ah, writing a blog post! This is a pleasure I was forgetting, and you can guess it looking at last post date of publication: it was around january... you may be wondering: what have you done along this long time? Well, quite a lot indeed: changed my job ( I am now working @ Intesa Sanpaolo Banking Group on Basel III statistical models) became dad for the third time (and if you are guessing, it’s a boy!

tags: ///////////////

2015 in review (let me boast myself a bit :)) 2016/01/01

The WordPress.com stats helper monkeys prepared a 2015 annual report for this blog. Here’s an excerpt: The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about **8,800** times in 2015. If it were a concert at Sydney Opera House, it would take about 3 sold-out performances for that many people to see it. Click here to see the complete report.

2015

Rename a Data Frame Within a Function Passing an Argument 2015/12/14

This is not actually a real post but rather a code snippet surrounded by text. Nevertheless I think it is a quite useful one: have you ever found yourself writing a function where a data frame is created, wanting to name that data frame based on a custom argument passed to the function? For instance, the output of your function is a really nice data frame name in a really trivial way, like “result”.

tags: ///

how to list loaded packages in R: ramazon gets cleaver 2015/09/10

It was around midnight here in Italy: I shared the code on Github, published a post on G+, Linkedin and Twitter and then went to bed. In the next hours things got growing by themselves, with pleasant results like the following: https://twitter.com/DoodlingData/status/635057258888605696 The R community found ramazon a really helpful package. And I actually think it is: Amazon AWS is nowadays one of the most common tools for online web applications and websites hosting.

tags: //////////

ramazon: Deploy your Shiny App on AWS with a Function 2015/08/18

Because Afraus received a good interest, last month I override shinyapps.io free plan limits. That got me move my Shiny App on an Amazon AWS instance. Well, it was not so straight forward: even if there is plenty of tutorials around the web, every one seems to miss a part: upgrading R version, removing shiny-server examples… And even having all info it is still quite a long, error-prone process.

tags: ////////

Introducing Afraus: an Unsupervised Fraud Detection Algorithm 2015/07/02

The last Report to the Nation published by ACFE, stated that on average, fraud accounts for nearly the 5% of companies revenues. on average, fraud accounts for nearly the 5% of companies revenues [![Tweet: on average, fraud accounts for nearly the 5% of companies revenues. http://ctt.ec/u5E6x+](http://clicktotweet.com/img/tweet-graphic-4.png)](http://ctt.ec/q3j4X) Projecting this number for the whole world GDP, it results that the “fraud-country” produces something like a GDP 3 times greater than the Canadian GDP.

tags: /////////////

How to add a live chat to your Shiny app 2015/05/11

As I am currently working on a Fraud Analytics Web Application based on Shiny (currently on beta version, more later on this blog) I found myself asking: wouldn’t be great to add live chat support to my Web Application visitors? It would indeed! [caption id=“attachment_490” align=“aligncenter” width=“200”] an ancient example of chatting - Camera degli Sposi, Andrea Mantegna 1465 -1474[/caption] But how to do it? Unfortunately, looking on Google didn’t give any useful result.

tags: ////////

How to list file and folders within a folder ( basic file app) 2015/04/01

I know, we are not talking about analytics and no, this is not going to set me as a great data scientist… By the way: have you ever wondered how to list all files and folders within a root folder just hitting a button? I have been looking for something like that quite a lot of times, for instance when asked to write down an index of all the working papers pertaining to a specific audit ( yes, **I am an auditor, **sorry about that): really time-consuming and not really value-adding activity.

tags: ////

Catching Fraud with Benford's law (and another Shiny App) 2015/02/06

In the early ‘900 Frank Benford observed that ’1’ was more frequent as first digit in his own logarithms manual. More than one hundred years later, we can use this curious finding to look for fraud on populations of data. just give a try to the shiny app What ‘Benford’s Law’ stands for? Nice stuff, but what can I do with Benford’s Law? You can find fraud with it Some precautions BenfordeR: another lean shiny application performing a benford analysis plotting results detecting suspected records What’s next In the early ‘900 Frank Benford observed that ’1’ was more frequent as first digit in his own logarithms manual.

tags: /////////

2014

How to use Github with Rstudio : step-by-step tutorial 2014/12/28

Pushing to my Github repository directly from the Rstudio project, avoiding that annoying “copy & paste” job. Since it is one of Best Practices for Scientific Computing, I have been struggling for a while with this problem. Now that I managed to solve the problem, I think you may find useful the detailed tutorial that follows. I am not going to explain you the reason why you should use Github with your Rstudio project, but if you are asking this to yourself, you may find useful a Stack Overflow discussion on the topic.

tags: ////

Network Visualisation With R 2014/12/05

The main reason why After all,I am still an Internal Auditor. Therefore I often face one of the typical internal auditors problems: understand links between people and companies, in order to discover the existence of hidden communities that could expose the company to unknown risks. the solution: linker In order to address this problem I am developing Linker, a lean shiny app that take 1 to 1 links as an input and gives as output a network map:

tags: ////////

Querying Google With R 2014/11/19

If you have a blog you may want to discover how your website is performing for given keywords on Google Search Engine. As we all know, this topic is not a trivial one. Problem is that the analogycal solution would be quite time-consuming, requiring you to search your website for every single keyword, on many many pages. Feeling this way? [caption id=“attachment_273” align=“aligncenter” width=“300”] “Pain and fear, pain and fear for me” - Oliver Twist[/caption]

tags: ////////////

Best Practices for Scientific Computing 2014/11/05

I reproduce here below principles from the amazing paper Best Practices for Scientific Computing, published on 2012 by a group of US and UK professors. The main purpose of the paper is to “teach” good programming habits shared from professional developers to people that weren’t born developer, and became developers just for professional purposes. Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently Best Practices for Scientific Computing Write programs for people, not computers.

tags: ////

download data to excel from web 2014/10/28

This simple tutorial will show you how to download data into an excel spreadsheet, creating a web query. Download data into excel select ”data” tab select ”from web”  input the desidered web URL click ”go” button **select **data you want to download click ”import” button Refresh downloaded data select ”data” tab select “connections” select your connection click “refresh” button

tags: /////

excel functions in R 2014/10/25

I have started my ”data-journey” from Excel, getting excited by formulas like VLookup(), right() and left(). then datasets got bigger, and I discovered that little spreadsheets were not enough, and look for something bigger and stronger, eventually coming to R. But as you know, ones never forget the first love. So, for fun and for practice, I have written down some of excel functions in R. I hope you will enjoy.

tags: ////

Answering to Ben ( functions comparison in R) 2014/09/13

Following the post about %in% operator, I received this tweet: https://twitter.com/benwhite21/status/510520550553165824 I gave a look to the code kindly provided by Ben and then I asked myself: I know dplyr is a really nice package, but which snippet is faster? to answer the question I’ve put the two snippets in two functions: #Ben snippet dplyr_snippet =function(object,column,vector){ filter(object,object[,column] %in% vector) } #AC snippet Rbase_snippet =function(object,column,vector){ object[object[,column] %in% vector,] } Then, thanks to the great package microbenchmark, I made a comparison between those two functions, testing the time of execution of both, for 100.

tags: ////

How to Put Equations into Evernote 2014/09/11

Problem Solution Tutorial 1. find Grapher among your applications 2. write the equation 3. copy the equation as TIFF 4. paste the equation into Evernote other ways to insert equations into evernote Problem Some time ago I was looking for an easy way to put some math writing within my Evernote notes trough my Mac device. Even if there is no official solution to the problem and the feature request is still pending within Evernote dedicated forum, I finally came out with a very simple way to solve your problem out.

tags: //

Code snippet: subsetting data frame in R by vector 2014/09/02

Problem: you haveto subset a data frame using as criteria the exact match of a vector content. for instance: you have a dataset with some attributes, and you have a vector with some values of one of the attributes. You want to make a filter based on the values in the vector. Example: sales records, each record is a deal. The vector is a list of selected customers you are interested in.

tags: ////

Saturation with Parallel Computation in R 2014/07/28

I have just saturated all my PC: full is the 4gb RAM and so is the CPU (I7 4770 @3.4 GHZ) Parallel Computation in R which is my secret? the doParallel package for R on mac The package lets you make some very useful parallel computation, giving you the possibility to use all the potentiality of your CPU. As a matter of fact, the standard R option is to use just on of the cores you have got on your PC.

tags: ////

How to Visualize Entertainment Expenditures on a Bubble Chart 2014/07/12

I’ve been recently asked to analyze some Board entertainment expenditures in order to acquire sufficient assurance about their nature and responsible. In response to that request I have developed a little Shiny app with an interesting reactive Bubble chart. The plot, made using ggplot2 package, is composed by: a categorical x value, represented by the clusters identified in the expenditures population A numerical y value, representing the total amount expended Points defined by the total amount of expenditure in the given cluster for each company subject.

tags: /////

0001

0001/01/01

I live in Italy, and more precisely in Milan, a city known for fashion and design events. During a lunch break I was visiting the Pinacoteca di Brera, a 200 centuries old museum. This museum is full of incredible paintings from the Renaissance period. During my visit I was particularly impressed from one of them: "La Vergine con il Bambino, angeli e Santi", by Piero della Francesca. If you see this painting you will find a profound of colours with a great equilibrium between different hues, the hardy usage of complementary colours and the ability expressed in the "

0001/01/01

a quick ride on pagedown: create PDFs from Rmarkdown /*! jQuery v1.11.3 | (c) 2005, 2015 jQuery Foundation, Inc. | jquery.org/license */ !function(a,b){"object"==typeof module&&"object"==typeof module.exports?module.exports=a.document?b(a,!0):function(a){if(!a.document)throw new Error("jQuery requires a window with a document");return b(a)}:b(a)}("undefined"!=typeof window?window:this,function(a,b){var c=[],d=c.slice,e=c.concat,f=c.push,g=c.indexOf,h={},i=h.toString,j=h.hasOwnProperty,k={},l="1.11.3",m=function(a,b){return new m.fn.init(a,b)},n=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,o=/^-ms-/,p=/-([\da-z])/gi,q=function(a,b){return b.toUpperCase()};m.fn=m.prototype={jquery:l,constructor:m,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=m.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return m.each(this,a,b)},map:function(a){return this.pushStack(m.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0a?b:0);return this.pushStack(c=0&&bc?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)},push:f,sort:c.sort,splice:c.splice},m.extend=m.fn.extend=function(){var a,b,c,d,e,f,g=arguments[0]||{},h=1,i=arguments.length,j=!1;for("boolean"==typeof g&&(j=g,g=arguments[h]||{},h++),"object"==typeof g||m.isFunction(g)||(g={}),h===i&&(g=this,h--);ih;h++)if(null!=(e=arguments[h]))for(d in e)a=g[d],c=e[d],g!==c&&(j&&c&&(m.isPlainObject(c)||(b=m.isArray(c)))?(b?(b=!1,f=a&&m.isArray(a)?a:[]):f=a&&m.isPlainObject(a)?a:{},g[d]=m.extend(j,f,c)):void 0!==c&&(g[d]=c));return g},m.extend({expando:"jQuery"+(l+Math.random()).replace(/\D/g,""),isReady:!0,error:function(a){throw new Error(a)},noop:function(){},isFunction:function(a){return"function"===m.type(a)},isArray:Array.isArray||function(a){return"array"===m.type(a)},isWindow:function(a){return null!=a&&a==a.window},isNumeric:function(a){return!m.isArray(a)&&a-parseFloat(a)+1=0},isEmptyObject:function(a){var b;for(b in a)return!1;return!0},isPlainObject:function(a){var b;if(!a||"object"!==m.type(a)||a.nodeType||m.isWindow(a))return!1;try{if(a.constructor&&!j.call(a,"constructor")&&!j.call(a.constructor.prototype,"isPrototypeOf"))return!1}catch(c){return!1}if(k.ownLast)for(b in a)return j.call(a,b);for(b in a);return void 0===b||j.