Python 2.7 Tutorial Pt 14

preview_player
Показать описание
I show you how to strip HTML tags from articles you got through Website Scraping using Python.

Рекомендации по теме
Комментарии
Автор

@0Allhell Perform a view source in the browser to find out which tags you need to target. You can scrape anything that shows on the screen

derekbanas
Автор

@entrevu To scrap anything you just need the basic concepts I covered here with a better understanding of regular expressions. I did a tutorial in PHP that covers advanced website scraping called Web Design and Programming Pt 24. The Regular Expression explanation is identical to regex in python. I hope that helps

derekbanas
Автор

I actually use PHP most of the time, but with Python Beautiful Soup has improved lately and is quite good.

derekbanas
Автор

They may have changed the tags a bit. Take a look if the tag changed around the snippet maybe

derekbanas
Автор

I have a bunch of tutorials on scraping web pages with php. They are in my php tutorial playlist on my YouTube channel

derekbanas
Автор

@ma1achite I use eclipse classic. It's free and works with most every language

derekbanas
Автор

Sorry, but I'd have to know more about how that information is checked.

derekbanas
Автор

@ma1achite he's using Eclipse google it eclipse IDE

entrevu
Автор

figured it out now im just getting errors with re.findall giving an

TypeError: Expected string or buffer

AlucardHelIsing
Автор

Hello! I am wondering whether you have or know of a tutorial to scrape from pages that are auto-generated with Javascript.

TheMariouka
Автор

hello again, its been a while... i was wondering which is the best method to use for web scrapping.. curl ? beautiful soap ? get_html? for example i can block the curl to my site through the confing.ini ... so i wanna start scrapping but i dont know which is the right or best method to use ...

emgoldexgreeceemgoldex
Автор

my only question is how to make eclipse recognize the beautifulsoup download (I used 'python setup.py install' in terminal so were does these files have to go? Like where do I have to put the beautifulsoup.py or other files that came with the install. As you would expect In eclipse I am getting an error
Unresolved import: BeautifulSoup

AlucardHelIsing
Автор

I use your exact code but I only get the links and the titles. The code fails to output the snippet of the article. Any help? Has the feed for Huffington Post changed?

theLach
Автор

Hai Derek,
i have a question how to pass the credentials to scrap website.

sainaths
Автор

Since my network is behind a proxy, so when i open a webpage it asks me for username and password, is there any way that i can store username password in the program it self so that it doesn't asks
I searched and used urllib2 -> proxy handlers but got error

harendraSinghIIITDMJ
Автор

What'd you do to fix this error importing BS?

pavanjared
Автор

Hi Derek. I need your help Do you have an email..I wll write a lot ..hope you answer

paulasf