R tools in readr for reading in fixed width files and other formats (CC249)

preview_player
Показать описание
The readr R package numerous tools for reading in various types of text files including fixed width files (fwf), tab separated values (tsv) files, and comma separated values (csv) files. In this episode Pat shows how to read in fwf files and some useful tools that are shared across readr functions for making it easier to read in one or many files. He also discusses various arguments in the tar program for extracting files from compressed archives to specific directories. The overall goal of this project is to highlight reproducible research practices using a number of tools. The specific output from this project will be a map-based visual that shows the level of drought across the globe.

#readr #read_fwf #tar #R #Rstats

Support Riffomonas by becoming a Patreon member!

You can also find complete tutorials for learning R with the tidyverse using...

0:00 Introduction
5:53 Figuring out how to read in a file
10:28 Reading in fixed width data with read_fwf
16:58 How to specify NA values with readr functions
17:36 How to specify column types
19:09 How to read in specific columns
20:06 How to read in multiple files at once
21:55 Cleaning and outputting data
Рекомендации по теме
Комментарии
Автор

Thanks for the video, really interesting and useful!

Автор

Cleaning files is where too much time is spent. Obviously it's part of the job we signed up for. For fun I'm doing text analysis on my favorite card game, and every time I add a new analysis I go back and have to update my cleaning function--removing nonsense values, renaming factors to something human readable, correcting wrong values, removing whitespace and most recently, replacing invisible characters like \u00A0 that mess up parsing.

qwerty
Автор

subscribed and bell on this chanell. brilliant

alphaena
Автор

As always, very informative and useful 👍. Thank you so much!
Could you please consider doing a video on how to set up VS code for use with R (the extensions, configurations etc)?

thespaniardinme
Автор

I did this for the ISD (Integrated Surface Database) from NCDC as a PhD student using R's base read fixed width function. Started out as kind of cool, but got tedious quickly, as the number of variables and more was not constant from file to file! Ugh!!

rubenbehnke
Автор

Hello, is it possible to use paste in stead of glue for renaming colums like paste (c("namesA", "namesB", "namesC", "namesD"), 1:x)?

benjaminbulle
Автор

Shouldn’t as.numeric(prcp)/100) be as.numeric(prcp)/10)?

ipf
Автор

I have seen worse. Try to read in data from a table in word that has multiple groups set apart with a new row.

haraldurkarlsson