Data Analytics - Andrea Cirillo

2020

Italy Coronavirus Outbreak: numbers and stats 2020/02/24

Italy NCOV-19 outbreak number of confirmed cases and deaths in Italy daily change in confirmed cases in Italy daily confirmed cases in Italy daily change in confirmed cases in Lombardy comparison between Lombardy and Hubei province daily new cases ICU numbers Italy NCOV-19 outbreak For personal reasons I am trying to track the number of NCOV-19 confirmed cases in Italy as well as the number of deaths (since I live in Italy, it is not difficult to guess the personal reason…).

tags: dataviz /data analysis /data analytics /

2018

how to use PaletteR to automagically build palettes from pictures 2018/05/08

Introducing paletter Installing paletter Creating a palette from your image Functional specification Reading a picture into the RGB colourspace Processing the RGB image trough kmeans Moving to the hsv colours space Removing outliers Optimising palette How to apply paletteR in ggplot2 Join us I live in Italy, and more precisely in Milan, a city known for fashion and design events. During a lunch break I was visiting the Pinacoteca di Brera, a 200 centuries old museum.

tags: paletteR /R /Rstudio /dataviz /data analysis /data analytics /my_packages /

2016

streamline your analyses linking R to sas and more: the workfloweR 2016/09/21

we all know R is the first choice for statistical analysis and data visualisation, but what about big data munging? tidyverse (or we’d better say hadleyverse 😏) has been doing a lot in this field, nevertheless it is often the case this kind of activities being handled from some other coding language. Moreover, sometimes you get as an input pieces of analyses performed with other kind of languages or, what is worst, piece of databases packed in proprietary format (like .

tags: analytics /data analysis /data analytics /programming /R /sas /shiny /

Euro 2016 analytics: Who's playing the toughest game? 2016/06/21

I am really enjoying Uefa Euro 2016 Footbal Competition, even because our national team has done pretty well so far. That’s why after browsing for a while statistics section of official EURO 2016 website I decided to do some analysis on the data they share ( as at the 21th of June). Just to be clear from the beginning: we are not talking of anything too rigourus, but just about some interesting questions with related answers gathered mainly through data visualisation.

tags: analytics /data /data analysis /data analytics /Github /R /Rstudio /soccer /

Over 50 practical recipes for data analysis with R in one book 2016/05/11

Ah, writing a blog post! This is a pleasure I was forgetting, and you can guess it looking at last post date of publication: it was around january... you may be wondering: what have you done along this long time? Well, quite a lot indeed: changed my job ( I am now working @ Intesa Sanpaolo Banking Group on Basel III statistical models) became dad for the third time (and if you are guessing, it’s a boy!

tags: algorithm /analytics /apps /computer science /data analysis /data analytics /Github /R /Rstudio /shiny /shiny apps /social media /social media analytics /tutorials /web query /

2015

how to list loaded packages in R: ramazon gets cleaver 2015/09/10

It was around midnight here in Italy: I shared the code on Github, published a post on G+, Linkedin and Twitter and then went to bed. In the next hours things got growing by themselves, with pleasant results like the following: https://twitter.com/DoodlingData/status/635057258888605696 The R community found ramazon a really helpful package. And I actually think it is: Amazon AWS is nowadays one of the most common tools for online web applications and websites hosting.

tags: algorithm /amazon /analytics /apps /aws /data analytics /R /Rstudio /shiny /shiny apps /

Introducing Afraus: an Unsupervised Fraud Detection Algorithm 2015/07/02

The last Report to the Nation published by ACFE, stated that on average, fraud accounts for nearly the 5% of companies revenues. on average, fraud accounts for nearly the 5% of companies revenues [![Tweet: on average, fraud accounts for nearly the 5% of companies revenues. http://ctt.ec/u5E6x+](http://clicktotweet.com/img/tweet-graphic-4.png)](http://ctt.ec/q3j4X) Projecting this number for the whole world GDP, it results that the “fraud-country” produces something like a GDP 3 times greater than the Canadian GDP.

tags: algorithm /analytics /apps /computer science /data /data analysis /data analytics /fraud /fraud analytics /internal audit /R /shiny /shiny apps /

Catching Fraud with Benford's law (and another Shiny App) 2015/02/06

In the early ‘900 Frank Benford observed that ’1’ was more frequent as first digit in his own logarithms manual. More than one hundred years later, we can use this curious finding to look for fraud on populations of data. just give a try to the shiny app What ‘Benford’s Law’ stands for? Nice stuff, but what can I do with Benford’s Law? You can find fraud with it Some precautions BenfordeR: another lean shiny application performing a benford analysis plotting results detecting suspected records What’s next In the early ‘900 Frank Benford observed that ’1’ was more frequent as first digit in his own logarithms manual.

tags: algorithm /analytics /data /data analysis /data analytics /internal audit /R /shiny /shiny apps /

2014

Network Visualisation With R 2014/12/05

The main reason why After all,I am still an Internal Auditor. Therefore I often face one of the typical internal auditors problems: understand links between people and companies, in order to discover the existence of hidden communities that could expose the company to unknown risks. the solution: linker In order to address this problem I am developing Linker, a lean shiny app that take 1 to 1 links as an input and gives as output a network map:

tags: analytics /communities /data analysis /data analytics /internal audit /Linker /network analysis /R /

Best Practices for Scientific Computing 2014/11/05

I reproduce here below principles from the amazing paper Best Practices for Scientific Computing, published on 2012 by a group of US and UK professors. The main purpose of the paper is to “teach” good programming habits shared from professional developers to people that weren’t born developer, and became developers just for professional purposes. Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently Best Practices for Scientific Computing Write programs for people, not computers.

tags: analytics /computer science /data analytics /R /

excel right() function in R 2014/10/27

as part of the** excel functions in R,** I have developed this custom function, reproducing the excel right() function in th R language. Feel free to copy and use it. [code language=“r”] right = function (string, char){ substr(string,nchar(string)-(char-1),nchar(string))} [/code] you can find other function in the Excel functions in R post.

tags: analytics /data /data analysis /data analytics /excel /excel spreadsheet /functions /R /

excel left() function in R 2014/10/27

as part of the excel functions in R, I have developed this custom function, emulating the excel left() function in th R language. Feel free to copy and use it. left = function (string,char){ substr(string,1,char)} you can find other function in theExcel functions in R post.

tags: analytics /data /data analysis /data analytics /excel /functions /R /

Answering to Ben ( functions comparison in R) 2014/09/13

Following the post about %in% operator, I received this tweet: https://twitter.com/benwhite21/status/510520550553165824 I gave a look to the code kindly provided by Ben and then I asked myself: I know dplyr is a really nice package, but which snippet is faster? to answer the question I’ve put the two snippets in two functions: #Ben snippet dplyr_snippet =function(object,column,vector){ filter(object,object[,column] %in% vector) } #AC snippet Rbase_snippet =function(object,column,vector){ object[object[,column] %in% vector,] } Then, thanks to the great package microbenchmark, I made a comparison between those two functions, testing the time of execution of both, for 100.

tags: analytics /data analysis /data analytics /R /

How to Visualize Entertainment Expenditures on a Bubble Chart 2014/07/12

I’ve been recently asked to analyze some Board entertainment expenditures in order to acquire sufficient assurance about their nature and responsible. In response to that request I have developed a little Shiny app with an interesting reactive Bubble chart. The plot, made using ggplot2 package, is composed by: a categorical x value, represented by the clusters identified in the expenditures population A numerical y value, representing the total amount expended Points defined by the total amount of expenditure in the given cluster for each company subject.

tags: analytics /data analysis /data analytics /R /shiny apps /