Parsing the Stack Exchange data dump with Python

preview_player
Показать описание
I need test data, and I want to use the Stack Exchange data dump. I comes in large XML files though. I've written a parser in Python to handle this.

Рекомендации по теме
Комментарии
Автор

Nice Andre, very usefull information, creating test data is always a big hassle.

saqibhussain
Автор

this was really helpful! thank you so much!🙏

donotvent
Автор

thank you so much for explaining this! :D

abayaninja
Автор

Weer erg leuk, zou je files via sets kunnen vergelijken net als t-sql i.p.v. loopje?

stackfiles = ['f1.txt', 'f2.txt']
dirfiles = ['f2.txt', 'f3.txt']

if bool(set(stackfiles) & set(dirfiles)):
print('intersection')
else:
print('no intersection')

GroterRonald
Автор

Thanks for explaining this script. However, I am trying to convert it to run concurrently on a cluster, any idea on how to do this?

ifeanyindukwe
Автор

Just replace event with _ in the for loop definition.

MaxPaulus-ne