Practical shell commands for data scientists / analysts (command line tutorial)

preview_player
Показать описание
I made this video as a practical manual / command line tutorial to data scientist or anyone who had to do any data preparation, data cleansing and analytics / data science work as I truly believe mastery of shell commands / command line should be second only to R and Python in anyone's arsenal.

00:00 Intro
01:07 Dataset from kaggle
03:57 Introduction to shell
05:22 date command in bash (zsh)
06:45 Shell variables
10:40 Environment variables vs Shell variables
11:45 cd (change directory) command
12:17 mkdir (make directory) command
13:21 rm -r (remove recursively) command
13:48 mv (move) command
14:44 pwd (print working directory) command
15:24 && in bash
16:31 unzip (unzip a file) command
18:11 ls (list) command
19:27 cp (copy) command
20:15 -v (verbose) option
24:01 wc (wordcount) command
26:21 * for wildcard
27:22 head / tail -n (first / last n lines of file)
30:25 | (pipe character) in shell
32:35 cut command in bash
36:06 shell redirection
37:17 attach to append to file
38:07 cat (concatenate) to print contents
39:08 cut -d, -f colnum for csv
46:33 sort command
57:53 uniq command
01:00:30 arithmetic (add, minus etc) in shell
01:02:22 $(( )) for arithmetic expansion
01:02:58 expr (to evaluate an expression in Unix)
01:06:13 data cleansing / preparation / exploratory scenarios in shell (bash)
01:15:46 paste command
01:16:37 bc in shell (basic calculator)

Notes:
Two main types of variables
1. Environment variables
2. Shell variables

cat: concatenate (print content of a file)
cd: change directory
ls: list
mkdir: make directory
mv: move
pwd: print working directory
rm: remove
cp: copy
&&: logical and (operate the second command
after the first has successfully execute
`[angled-brackets]`: shell redirection
`[double angled-brackets]`: attach operator (appends to file)
`$((newval-oldval))`: perform arithmetic

### Exploratory commands
- `wc` word count
- `l` for lines, `w` for words, `c` for characters (bytes)
- head, tail
- `n` for number of lines
- `|` connects the standard output of the process (left) to
stdin process of the right
- `cut`
- `c` for characters (eg. `-c 1-5`, `-c 6,8,9`, `-c :11`)
- `d` for delimiter (tabs --default, comma, spaces)
- `f` for field number (eg. `-f 2,3,5`)
- `sort`
- `o` output
- `r` reverse
- `u` unique
- `k` sort by a specific column
- `n` sort numerically
- `uniq`
- `c` counts
- `bc`: basic calculator
- `2+3 | bc`

Scenarios
1. Tabulate and return the top 20 destination countries along with the counts
and Customer ID of the top 30 most expensive unit-priced items.
3. Find the 5 most expensive items, optionally sum them up
Рекомендации по теме