2018
introducing vizscorer: a bot advisor to improve your ggplot plots
2018/11/12
introducing vizscorer: a bot advisor to improve your ggplot plots
One of the most frustrating issues I face in my professional life is the plentitude of ineffective reports generated within my company. Wherever I look around me is plenty of junk charts, like barplot showing useless 3D effects or ambiguous and crowded pie charts. I do understand the root causes of this desperate state of the art: people have always less time to dedicate to reports crafting, and even less to dedicate to their plot. In the crazy and speedy-going working life, my colleagues have no time and for learning data visualization principles. Even so, this remains quite a big problem since a lot of time and money-wasting consequences come from poorly crafted reports and plots:
tags:
R /algorithm /analytics /dataviz /my_packages /
UpdateR package: update R version with a function (on MAC OSX)
2018/03/10
I personally really appreciate the InstallR package from Tal galilli, since it lets you install a great number of tools needed for working with R just running a function.
tags:
algorithm /analytics /R /my_packages /
2017
Why are you being so silent?
2017/04/12
It has being nearly half an year since the last post about workflower was out, why did I stay so silent for that long?
I have three major updates to explain the silence:
2016
streamline your analyses linking R to sas and more: the workfloweR
2016/09/21
we all know R is the first choice for statistical analysis and data visualisation, but what about big data munging? tidyverse (or weâd better say hadleyverse đ) has been doing a lot in this field, nevertheless it is often the case this kind of activities being handled from some other coding language. Moreover, sometimes you get as an input pieces of analyses performed with other kind of languages or, what is worst, piece of databases packed in proprietary format (like .dta .xpt and other). So letâs assume you are an R enthusiast like I am, and you do with R all of your work, reporting included, wouldnât be great to have some nitty gritty way to merge together all these languages in a streamlined workflow?
tags:
analytics /data analysis /data analytics /programming /R /sas /shiny /
ggplot2 themes examples
2016/08/09
this short post is exactly what it seems: a showcase of all ggplot2 themes available within the ggplot2 package. I was doing such a list for myself ( you know that feeling âŚâhow would it look like with this theme? letâs try this oneâŚâ) and at the end I thought it could have be useful for my readers. At least this post will save you the time of trying all differents themes just to have a sense of how they look like.
tags:
analytics /data analysis /dataviz /ggplot /ggplot2 /ggthemes /hacking /plot /png /R /themes /
Euro 2016 analytics: Who's playing the toughest game?
2016/06/21
I am really enjoying Uefa Euro 2016 Footbal Competition, even because our national team has done pretty well so far. Thatâs why after browsing for a while statistics section of official EURO 2016 website I decided to do some analysis on the data they share ( as at the 21th of June).
Just to be clear from the beginning: we are not talking of anything too rigourus, but just about some interesting questions with related answers gathered mainly through data visualisation.
tags:
analytics /data /data analysis /data analytics /Github /R /Rstudio /soccer /
Over 50 practical recipes for data analysis with R in one book
2016/05/11
Ah, writing a blog post! This is a pleasure I was forgetting, and you can guess it looking at last post date of publication: it was around january... you may be wondering: what have you done along this long time? Well, quite a lot indeed:
2015
Rename a Data Frame Within a Function Passing an Argument
2015/12/14
This is not actually a real post but rather a code snippet surrounded by text.
Nevertheless I think it is a quite useful one: have you ever found yourself writing a function where a data frame is created, wanting to name that data frame based on a custom argument passed to the function?
For instance, the output of your function is a really nice data frame name in a really trivial way, like âresultâ.
tags:
analytics /R /Rstudio /
how to list loaded packages in R: ramazon gets cleaver
2015/09/10
It was around midnight here in Italy:
I shared the code on Github, published a post on G+, Linkedin and Twitter and then went to bed.
In the next hours things got growing by themselves, with pleasant results like the following:
https://twitter.com/DoodlingData/status/635057258888605696
The R community found ramazon a really helpful package.
And I actually think it is: Amazon AWS is nowadays one of the most common tools for online web applications and websites hosting.
tags:
algorithm /amazon /analytics /apps /aws /data analytics /R /Rstudio /shiny /shiny apps /
ramazon: Deploy your Shiny App on AWS with a Function
2015/08/18
Because Afraus received a good interest, last month I override shinyapps.io free plan limits.
That got me move my Shiny App on an Amazon AWS instance.
Well, it was not so straight forward: even if there is plenty of tutorials around the web, every one seems to miss a part: upgrading R version, removing shiny-server examples⌠And even having all info it is still quite a long, error-prone process.
All this pain is removed by ramazon, an R package that I developed to take care of everything is needed to deploy a shiny app on an AWS instance. An early disclaimer for Windows users: only Apple OS X is supported at the moment.
tags:
amazon /analytics /aws /data analysis /hacking /R /shiny /shiny apps /
Introducing Afraus: an Unsupervised Fraud Detection Algorithm
2015/07/02
The last Report to the Nation published by ACFE, stated that on average, fraud accounts for nearly the 5% of companies revenues.
on average, fraud accounts for nearly the 5% of companies revenues


Projecting this number for the whole world GDP, it results that the âfraud-countryâ produces something like a GDP 3 times greater than the Canadian GDP.
tags:
algorithm /analytics /apps /computer science /data /data analysis /data analytics /fraud /fraud analytics /internal audit /R /shiny /shiny apps /
How to add a live chat to your Shiny app
2015/05/11
As I am currently working on a Fraud Analytics Web Application based on Shiny (currently on beta version, more later on this blog) I found myself asking: wouldnât be great to add live chat support to my Web Application visitors?
It would indeed!
[caption id=âattachment_490â align=âaligncenterâ width=â200â]
an ancient example of chatting - Camera degli Sposi, Andrea Mantegna 1465 -1474[/caption]
tags:
analytics /apps /chat /data analysis /R /shiny /shiny apps /tutorials /
Catching Fraud with Benford's law (and another Shiny App)
2015/02/06
In the early â900 Frank Benford observed that â1â was more frequent as first digit in his own logarithms manual.
More than one hundred years later, we can use this curious finding to look for fraud on populations of data.
2014
Network Visualisation With R
2014/12/05
The main reason why
After all, I am still an Internal Auditor. Therefore I often face one of the typical internal auditors problems: understand links between people and companies, in order to discover the existence of hidden communities that could expose the company to unknown risks.
the solution: linker
In order to address this problem I am developing Linker, a lean shiny app that take 1 to 1 links as an input and gives as output a network map:
tags:
analytics /communities /data analysis /data analytics /internal audit /Linker /network analysis /R /
Querying Google With R
2014/11/19
If you have a blog you may want to discover how your website is performing for given keywords on Google Search Engine. As we all know, this topic is not a trivial one.
Problem is that the analogycal solution would be quite time-consuming, requiring you to search your website for every single keyword, on many many pages.
Feeling this way?
[caption id=âattachment_273â align=âaligncenterâ width=â300â]
âPain and fear, pain and fear for meâ - Oliver Twist[/caption]
tags:
algorithm /analytics /apps /google /R /Rstudio /SEO /shiny /shiny apps /social media /social media analytics /web query /
Best Practices for Scientific Computing
2014/11/05
I reproduce here below principles from the amazing paper Best Practices for Scientific Computing, published on 2012 by a group of US and UK professors. The main purpose of the paper is to âteachâ good programming habits shared from professional developers to people that werenât born developer, and became developers just for professional purposes.
Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently
Best Practices for Scientific Computing
Write programs for people, not computers.
1. _a program should not require its readers to hold more than a handful of facts in memory at once_
2. _names should be consistent, distinctive and meaningful_
3. _code style and formatting should be consistent_
4. _all aspects of software development should be broken down into tasks roughly an hour long<!-- more -->_
Automate repetitive tasks.
1. _rely on the computer to repeat tasks_
2. _save recent commands in a file for re-use_
3. _use a build tool to automate scientific workflows_
Use the computer to record history.
1. _software tools should be used to track computational work automatically_
Make incremental changes.
1. _work in small steps with frequent feedback and course correction_
Use version control.
1. _use a version control system_
2. _everything that has been created manually should be put in version control_
Donât repeat yourself (or others).
1. _every piece of data must have a single authoritative representation in the system_
2. _code should be modularized rather than copied and pasted_
3. _re-use code instead of rewriting it_
Plan for mistakes.
1. _add assertions to programs to check their operation_
2. _use an off-the-shelf unit testing library_
3. _use all available oracles when testing programs_
4. _turn bugs into test cases_
5. _use a symbolic debugger_
Optimize software only after it works correctly.
1. _use a profiler to identify bottlenecks_
2. _write code in the highest-level language possible_
Document design and purpose, not mechanics.
1. _document interfaces and reasons, not implementations_
2. _refactor code instead of explaining how it works_
3. _embed the documentation for a piece of software in that software_
Collaborate.
1. _use pre-merge code reviews_
2. _use pair programming when bringing someone new up to speed and when tackling particularly tricky problems_
if you want to discover more, you can download your copy of Best Practice Scientific Computing here below
tags:
analytics /computer science /data analytics /R /
excel right() function in R
2014/10/27
as part of the** excel functions in R,** I have developed this custom function, reproducing the excel right() function in th R language. Feel free to copy and use it.
[code language=ârâ]
right = function (string, char){
substr(string,nchar(string)-(char-1),nchar(string))}
[/code]
you can find other function in the Excel functions in R post.
tags:
analytics /data /data analysis /data analytics /excel /excel spreadsheet /functions /R /
excel left() function in R
2014/10/27
as part of the excel functions in R, I have developed this custom function, emulating the excel left() function in th R language. Feel free to copy and use it.
left = function (string,char){
substr(string,1,char)}
you can find other function in the** Excel functions in R post**.
tags:
analytics /data /data analysis /data analytics /excel /functions /R /
excel functions in R
2014/10/25
I have started my âdata-journeyâ from Excel, getting excited by formulas like VLookup(), right() and left().
then datasets got bigger, and I discovered that little spreadsheets were not enough, and look for something bigger and stronger, eventually coming to R.
But as you know, ones never forget the first love.
So, for fun and for practice, I have written down some of excel functions in R.
I hope you will enjoy.
tags:
analytics /excel /excel spreadsheet /R /
Mining Twitter with R
2014/10/09
Great tutorial on text mining with twitter byPaeng Angnakoon
[youtube=http://youtu.be/mJVcANlkxU8]
tags:
analytics /data analysis /R /tutorials /twitter /
Answering to Ben ( functions comparison in R)
2014/09/13
Following the post about %in% operator, I received this tweet:
https://twitter.com/benwhite21/status/510520550553165824
I gave a look to the code kindly provided by Ben and then I asked myself:
I know dplyr is a really nice package, but which snippet is faster?
to answer the question Iâve put the two snippets in two functions:
#Ben snippet dplyr_snippet =function(object,column,vector){ filter(object,object[,column] %in% vector) } #AC snippet Rbase_snippet =function(object,column,vector){ object[object[,column] %in% vector,] }
Then, thanks to the great package microbenchmark, I made a comparison between those two functions, testing the time of execution of both, for 100.000 times.
tags:
analytics /data analysis /data analytics /R /
Code snippet: subsetting data frame in R by vector
2014/09/02
Problem:
you haveto subset a data frame using as criteria the exact match of a vector content.
for instance:
you have a dataset with some attributes, and you have a vector with some values of one of the attributes. You want to make a filter based on the values in the vector.
Example: sales records, each record is a deal.
The vector is a list of selected customers you are interested in.
tags:
analytics /data analysis /R /tutorials /
Saturation with Parallel Computation in R
2014/07/28
I have just saturated all my PC:
full is the 4gb RAM

and so is the CPU (I7 4770 @3.4 GHZ)

Parallel Computation in R
which is my secret?
the doParallel package for R on mac
The package lets you make some very useful parallel computation, giving you the possibility to use all the potentiality of your CPU.
As a matter of fact, the standard R option is to use just on of the cores you have got on your PC.
tags:
analytics /data analysis /R /tutorials /
How to Visualize Entertainment Expenditures on a Bubble Chart
2014/07/12

Iâve been recently asked to analyze some Board entertainment expenditures in order to acquire sufficient assurance about their nature and responsible.
In response to that request I have developed a little Shiny app with an interesting reactive Bubble chart.
The plot, made using ggplot2 package, is composed by:
a categorical x value, represented by the clusters identified in the expenditures population
A numerical y value, representing the total amount expended
Points defined by the total amount of expenditure in the given cluster for each company subject.
Morover, point size is given by the ratio between amount regularly passed through Account Receivable Process and total amount of expenditure for that subject in that cluster.
tags:
analytics /data analysis /data analytics /R /shiny apps /