2020

Italy Coronavirus Outbreak: numbers and stats 2020/02/24

Italy NCOV-19 outbreak

For personal reasons I am trying to track the number of NCOV-19 confirmed cases in Italy as well as the number of deaths (since I live in Italy, it is not difficult to guess the personal reason…). I am thus regularly monitoring news from the italian official sources like “Regione Lombardia” and “Protezione Civile”.

tags: ///

2018

how to use PaletteR to automagically build palettes from pictures 2018/05/08

I live in Italy, and more precisely in Milan, a city known for fashion and design events. During a lunch break I was visiting the Pinacoteca di Brera, a 200 centuries old museum. This museum is full of incredible paintings from the Renaissance period. During my visit I was particularly impressed from one of them: “La Vergine con il Bambino, angeli e Santi”, by Piero della Francesca.

tags: ///////

2016

streamline your analyses linking R to sas and more: the workfloweR 2016/09/21

we all know R is the first choice for statistical analysis and data visualisation, but what about big data munging? tidyverse (or we’d better say hadleyverse 😏) has been doing a lot in this field, nevertheless it is often the case this kind of activities being handled from some other coding language. Moreover, sometimes you get as an input pieces of analyses performed with other kind of languages or, what is worst, piece of databases packed in proprietary format (like .dta .xpt and other). So let’s assume you are an R enthusiast like I am, and you do with R all of your work, reporting included, wouldn’t be great to have some nitty gritty way to merge together all these languages in a streamlined workflow?

tags: ///////

ggplot2 themes examples 2016/08/09

this short post is exactly what it seems: a showcase of all ggplot2 themes available within the ggplot2 package. I was doing such a list for myself ( you know that feeling …“how would it look like with this theme? let’s try this one…”) and at the end I thought it could have be useful for my readers. At least this post will save you the time of trying all differents themes just to have a sense of how they look like.

tags: ///////////

Euro 2016 analytics: Who's playing the toughest game? 2016/06/21

I am really enjoying Uefa Euro 2016 Footbal Competition, even because our national team has done pretty well so far. That’s why after  browsing for a while statistics section of official EURO 2016 website I decided to do some analysis on the data they share ( as at the 21th of June).

Just to be clear from the beginning: we are not talking of anything too rigourus, but just about some interesting questions with related answers gathered mainly through data visualisation.

tags: ////////

Over 50 practical recipes for data analysis with R in one book 2016/05/11

Ah, writing a blog post! This is a pleasure I was forgetting, and you can guess it looking at last post date of publication: it was around january... you may be wondering: what have you done along this long time? Well, quite a lot indeed:

2015

ramazon: Deploy your Shiny App on AWS with a Function 2015/08/18

Because Afraus received a good interest, last month I override shinyapps.io free plan limits.

That got me move my Shiny App on an Amazon AWS instance.

Well, it was not so straight forward: even if there is plenty of tutorials around the web, every one seems to miss a part: upgrading R version, removing shiny-server examples… And even having all info it is still quite a long, error-prone process.

All this pain is removed by ramazon, an R package that I developed to take care of everything is needed to deploy a shiny app on an AWS instance. An early disclaimer for Windows users: only Apple OS X is supported at the moment.

tags: ////////

Introducing Afraus: an Unsupervised Fraud Detection Algorithm 2015/07/02

The last Report to the Nation published by ACFE, stated that on average, fraud accounts for nearly the 5% of companies revenues.

on average, fraud accounts for nearly the 5% of companies revenues

Tweet: on average, fraud accounts for nearly the 5% of companies revenues. http://ctt.ec/u5E6x+

ACFE Infographic: typical organization loses 5% of their revenues for fraud

Projecting this number for the whole world GDP, it results that the “fraud-country” produces something like a GDP 3 times greater than the Canadian GDP.

tags: /////////////

How to add a live chat to your Shiny app 2015/05/11

As I am currently working on a Fraud Analytics Web Application based on Shiny (currently on beta version, more later on this blog) I found myself asking: wouldn’t be great to add live chat support to my Web Application visitors?

It would indeed!

[caption id=“attachment_490” align=“aligncenter” width=“200”]an ancient example of chatting - Camera degli Sposi, Andrea Mantegna 1465 -1474 an ancient example of chatting - Camera degli Sposi, Andrea Mantegna 1465 -1474[/caption]

tags: ////////

Catching Fraud with Benford's law (and another Shiny App) 2015/02/06

In the early ‘900 Frank Benford observed that ’1’ was more frequent as first digit in his own logarithms manual.

More than one hundred years later, we can use this curious finding to look for fraud on populations of data.

What ‘Benford’s Law’ stands for?

Around 1938 Frank Benford, a physicist at the General Electrics research laboratories, observed that logarithmic tables were more worn within first pages: was this casual or due to an actual prevalence of numbers near 1 as first digits?

tags: /////////

2014

Network Visualisation With R 2014/12/05

The main reason why

After all, I am still an Internal Auditor. Therefore I often face one of the typical internal auditors problems: understand links between people and companies, in order to discover the existence of hidden communities that could expose the company to unknown risks.

the solution: linker

In order to address this problem I am developing Linker, a lean shiny app that take 1 to 1 links as an input and gives as output a network map:

tags: ////////

Answering to Ben ( functions comparison in R) 2014/09/13

Following the post about %in% operator, I received this tweet: https://twitter.com/benwhite21/status/510520550553165824

I gave a look to the code kindly provided by Ben and then I asked myself: I know dplyr is a really nice package,  but which snippet is faster?

to answer the question I’ve put the two snippets in two functions:

#Ben snippet dplyr_snippet =function(object,column,vector){ filter(object,object[,column] %in% vector) } #AC snippet Rbase_snippet =function(object,column,vector){ object[object[,column] %in% vector,] }

Then, thanks to the great package microbenchmark, I made a comparison between those two functions, testing the time of execution of both, for 100.000 times.

tags: ////

Code snippet: subsetting data frame in R by vector 2014/09/02

Problem:

you haveto subset a data frame using as criteria the exact match of a vector content.

for instance:

you have a dataset with some attributes, and you have a vector with some values of one of the attributes. You want to make a filter based on the values in the vector.

Example: sales records, each record is a deal.

The vector is a list of selected customers you are interested in.

tags: ////

Saturation with Parallel Computation in R 2014/07/28

I have just saturated all my PC:

full is the 4gb RAM

Full-in

and so is the CPU (I7 4770 @3.4 GHZ)

full-in

Parallel Computation in R

which is my secret?

the doParallel package for R on mac

The package lets you make some very useful parallel computation, giving you the possibility to use all the potentiality of your CPU.

As a matter of fact, the standard R option is to use  just on of the cores you have got on your PC.

tags: ////

How to Visualize Entertainment Expenditures on a Bubble Chart 2014/07/12

Bubble Chart Categorical Variables Expenditures Analytics

I’ve been recently asked to analyze some Board entertainment expenditures in order to acquire sufficient assurance about their nature and responsible.

In response to that request I have developed a little Shiny app with an interesting reactive Bubble chart.

The plot, made using ggplot2 package, is composed by: a categorical x value, represented by the clusters identified in the expenditures population A numerical y value, representing the total amount expended Points defined by the total amount of expenditure in the given cluster for each company subject. Morover, point size is given by the ratio between amount regularly passed through Account Receivable Process and total amount of expenditure for that subject in that cluster.

tags: /////