Tuesday, 24 January 2017

24.01.2017 - Geo Mapping


5 State Data in Google GeoChart
4 Plotting Cities on a Map along with Data
Showing Addresses on a Google Map

15.01.2017 -Google Charts


This one is called the 'Gnatt Chart'.

Thursday, 12 January 2017

11.01.2017.R

For this visualization techniques session, let's use the 'Big Mart' dataset. You can download it here.

As always, let's start by calling the library ggplot2

library(ggplot2)

# Scatter plot


A Scatter Plot is used to see the relationship between two continuous variables.

library(ggplot2)
ggplot(Big_Mart_Dataset_Sheet1, aes(Item_Visibility, Item_MRP)) +
  geom_point() + scale_x_continuous("Item Visibility", breaks = seq(0,0.35,0.05)) +
  scale_y_continuous("Item MRP", breaks = seq(0,270,by = 30))+ theme_bw()






 


ggplot(Big_Mart_Dataset_Sheet1, aes(Item_Visibility, Item_MRP)) +
  geom_point(aes(color = Item_Type)) +
  scale_x_continuous("Item Visibility", breaks = seq(0,0.35,0.05))+
  scale_y_continuous("Item MRP", breaks = seq(0,270,by = 30))+
  theme_bw() + labs(title="Scatterplot")
 

 


#facet_wrap works superb & wraps Item_Type in rectangular layout.
ggplot(Big_Mart_Dataset_Sheet1, aes(Item_Visibility, Item_MRP)) + geom_point(aes(color = Item_Type)) +
  scale_x_continuous("Item Visibility", breaks = seq(0,0.4,0.1))+
  scale_y_continuous("Item MRP", breaks = seq(0,270,by = 30))+
  theme_bw() + labs(title="Scatterplot") + facet_wrap( ~ Item_Type)









 

 #Histogram


A Histogram is used to plot continuous variable. It breaks the data into bins and shows frequency distribution of these bins. We can always change the bin size and see the effect it has on visualization.

 

#Bar & Stack Bar Chart


Bar charts are recommended when you want to plot a categorical variable or a combination of continuous and categorical variable.

ggplot(Big_Mart_Dataset_Sheet1, aes(Outlet_Establishment_Year)) + geom_bar(fill = "red")+theme_bw()+
  scale_x_continuous("Establishment Year", breaks = seq(1985,2010)) +
  scale_y_continuous("Count", breaks = seq(0,1500,150)) +
  coord_flip()+ labs(title = "Bar Chart") + theme_gray()
 

 

Another variation under this kind of visualization is the

Vertical Bar Chart:


ggplot(Big_Mart_Dataset_Sheet1, aes(Item_Type, Item_Weight)) +
  geom_bar(stat = "identity", fill = "darkblue") +
  scale_x_discrete("Outlet Type")+
  scale_y_continuous("Item Weight", breaks = seq(0,15000, by = 500))+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
  labs(title = "Bar Chart")

Stacked Bar Chart:

ggplot(Big_Mart_Dataset_Sheet1, aes(Outlet_Location_Type, fill = Outlet_Type)) + geom_bar()+
  labs(title = "Stacked Bar Chart", x = "Outlet Location Type", y = "Count of Outlets")



#Box plot


Box Plots are used to plot a combination of categorical and continuous variables. This plot is useful for visualizing the spread of the data and detect outliers. It shows five statistically significant numbers- the minimum, the 25th percentile, the median, the 75th percentile and the maximum.

ggplot(Big_Mart_Dataset_Sheet1, aes(Outlet_Identifier, Item_Outlet_Sales)) + geom_boxplot(fill = "red")+
  scale_y_continuous("Item Outlet Sales", breaks= seq(0,15000, by=500))+
  labs(title = "Box Plot", x = "Outlet Identifier")
 
 



#Area Chart


Area chart is used to show continuity across a variable or data set. It is very much same as line chart and is commonly used for time series plots. Alternatively, it is also used to plot continuous variables and analyze the underlying trends.


ggplot(Big_Mart_Dataset_Sheet1, aes(Item_Outlet_Sales)) +
  geom_area(stat = "bin", bins = 30, fill = "steelblue") +
  scale_x_continuous(breaks = seq(0,11000,1000))+
  labs(title = "Area Chart", x = "Item Outlet Sales", y = "Count")






 


 #Heat Map

 

Heat Map uses intensity (density) of colors to display relationship between two or three or many variables in a two dimensional image. It allows you to explore two dimensions as the axis and the third dimension by intensity of color.

ggplot(Big_Mart_Dataset_Sheet1, aes(Outlet_Identifier, Item_Type))+
  geom_raster(aes(fill = Item_MRP))+
  labs(title ="Heat Map", x = "Outlet Identifier", y = "Item Type")+
  scale_fill_continuous(name = "Item MRP")
 

  








#Correlogram


Correlogram is used to test the level of co-relation among the variable available in the data set. The cells of the matrix can be shaded or colored to show the co-relation value.
Darker the color, higher the co-relation between variables. Positive co-relations are displayed in blue and negative correlations in red color. Color intensity is proportional to the co-relation value.


install.packages("corrgram")
library(corrgram)

corrgram(Big_Mart_Dataset_Sheet1, order=NULL, panel=panel.shade, text.panel=panel.txt,
         main="Correlogram") 

 

That's all for this post. More on visualizations on later posts.