r-tastic

Weird and wonderful exploration of data using R

Trump VS Clinton Interpretable Text Classifier

I’ve been writing/talking a lot about LIME recently: in this blog/ at H20 meetup, or at coming AI Congress and I’m still sooo impressed by this tool for interpreting any, even black-box, algorithm! The part I love most is that LIME can be applied to both image and text data, that was well showcased in husky VS wolf (image) and Christian VS atheist (text) examples in the original publication. Thomas Lin Pedersen did an amazing job building lime package for R with excellent documentation and vignette.

End of Year thoughts

Sometimes it’s worth making New Year resolutions… A year ago I made one for 2017 to start an R blog using RMarkdown and Jekyll static sites. At the time, I didn’t even know git that well, had no clue what static sites are and was mostly oblivious to the rich and vibrant R community on Twitter. Fast-forward one year and… the picture couldn’t be any more different! I’d like to share my thoughts on writing this blog (and data science blog in general) and how it taught me about getting stuff done.

Star Wars Vs Star Trek Word Battle

It will go without saying that I’m super excited about the premiere of another Star Wars movie and I’m not an exception. This, together with with Piotr Migdal’s challenge posted on Data Science PL group on Facebook where he suggested comparing word frequencies between two different sources. It didn’t take me long to decide what source to choose! So in this short kand sweer blogpost I’m comparing word frequencies between two movie scripts: “Star Wars: The New Hope” (1977) and “Star Trek: The Motion Picture” (1979).

Automated and Unmysterious Machine Learning in Cancer Detection

I get bored from doing two things: i) spot-checking + optimising parameters of my predictive models and ii) reading about how ‘black box’ machine learning (particularly deep learning) models are and how little we can do to better understand how they learn (or not learn, for example when they take a panda bear for a vulture!). In this post I’ll test a) H2O’s function h2o.automl() that may help me automate the former and b) Thomas Lin Pedersen’s library(lime) that may help clarify the latter.

Friendships among top R-twitterers

Have you ever wondered whether the most active/popular R-twitterers are virtual friends? :) And by friends here I simply mean mutual followers on Twitter. In this post, I score and pick top 30 #rstats twitter users and analyse their Twitter friends’ network. You’ll see a lot of applications of rtweet and ggraph packages, as well as a very useful twist using purrr library, so let’s begin! BEFORE I START: OFF - TOPIC ON PERFECTIONISM After weeks and months (!