I stopped caring about the estimates that occasionally raid the internet about how much time data scientists spend on data wrangling VS modelling. The answer is: probably a lot and likely more than originally planned (probably indicating here dependency on the state and richness of input data and the intended application for it, ekhem). Still, the right tools can go a long way in achieving the desired result in the time frame that can surprise even the most optimistic of us.
It will go without saying that I’m super excited about the premiere of another Star Wars movie and I’m not an exception. This, together with with Piotr Migdal’s challenge posted on Data Science PL group on Facebook where he suggested comparing word frequencies between two different sources. It didn’t take me long to decide what source to choose! So in this short kand sweer blogpost I’m comparing word frequencies between two movie scripts: “Star Wars: The New Hope” (1977) and “Star Trek: The Motion Picture” (1979).
It’s a very quick post on how to get a list of datasets available from within R with their basic description (what package they can be found in, number of observations and variables). It always takes me some time to find the right dataset to showcase whatever process or method I’m working with, so this was really to make my life easier. So! I’m going to scrape the table with a list of R datasets from here using rvest and xml2 packages: