r-tastic

Weird and wonderful exploration of data using R

Animated Plots As Part Of Exploratory Data Analysis

The internet seems to be booming with blog posts on animated graphs, whether it’s for more serious purposes or not so much. I didn’t think anything more of it than just a gimmick or a cool way of spicing up your conference talk. However, I’m a total convert now and in this post I want to show a real value that such graph can add to your (absolutely serious!) exploratory analysis.

Cluster Validation In Unsupervised Machine Learning

In the previous post I showed several methods that can be used to determine the optimal number of clusters in your data - this often needs to be defined for the actual clustering algorithm to run. Once it’s run, however, there’s no guarantee that those clusters are stable and reliable. In this post I’ll show a couple of tests for cluster validation that can be easily run in R.

Determining the optimal number of clusters in your dataset

Recently, I worked a bit with cluster analysis: the common method in unsupervised learning that uses datasets without labeled responses to draw inferences. I wanted to put my notes together and write it all down before I forget it, thus the blog post. For the start, I’ll tackle multiple approaches to how to determine the number of clusters in your data. QUICK INTRO Clustering algorithms aim to establish a structure of your data and assign a cluster/segment to each datapoint based on the input data.

Scraping Online Table With Info on R datasets

It’s a very quick post on how to get a list of datasets available from within R with their basic description (what package they can be found in, number of observations and variables). It always takes me some time to find the right dataset to showcase whatever process or method I’m working with, so this was really to make my life easier. So! I’m going to scrape the table with a list of R datasets from here using rvest and xml2 packages:

Harari: Sentiment Analysis

So! Following my previous blog post where I scraped Amazon reviews of Yuval Harari’s Sapiens to create a wordcloud based on them, here I will compare results of sentiment analysis performed on Harari’s two books: Sapiens and Homo Deus. A QUICK INTRO For the context, Sapiens has been published originally in Hebrew in 2011. It, as Wikipedia puts it, [Sapiens] surveys the history of humankind from the evolution of archaic human species in the Stone Age up to the twenty-first century.