Python Web Crawler Tutorial - 5 - Parsing HTML

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

Those who are using Python 2.7 can replace to following lines
def __init__(self):
HTMLParser.__init__(self)

sukkiisukant
Автор

In python 2.7.6 This code is working(mind the underscore and camelCase)
from HTMLParser import HTMLParser
from urlparse import urlparse
class linkFinder(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
def handle_starttag(self, tag, attr):
print(tag)
def error(self, message):
pass
finder = linkFinder()
finder.feed("Your custom link")

pastorofmuppets
Автор

There is a really easy way of finding links if you think this is complicated like me. Go install requests library. Like this:

pip install requests-html

And with this:
>>>from requests_html import HTMLSession
>>>session = HTMLSession()
>>>r.html.absolute_links

sergenpeksen
Автор

What a cliffhanger! Enjoying the series! Can you suggest any good materials on Pyramid?

Slevender
Автор

hello Bucky,

i wrote like that, every thing is ok, do not get error, but nothing print out

danielkovacs
Автор

Hey Bucky, love your videos! Quick question, does the html.parser library support html5 parsing?

bttrflysweetheart
Автор

python 2.7 code
from HTMLParser import HTMLParser
import urllib


class LinkFinder(HTMLParser):

def __init__(self):
HTMLParser.__init__(self)

def error(self, message)
print('error' + message)

jeyko
Автор

I am using python2.7 and I get errors, I resolve some of them but some other I just cannot figure it out. Please note the things that could be different in python 2.7. So we can follow up smoothly, thank you for your work it is really helpful.

Farisology
Автор

Thanks for the video. How would you use this to indent a fixed HTML input? Is it possible? Thanks.

cricketer
Автор

Can i use beatufulsoup instead of this parser?

NikolaMilic
Автор

How can you work with this text editor, it damages my brain, why everything fades out and the underlines are just completely nuts. But thanks for content.

DrSnej
Автор

Hey Bucky, I just downloaded the newest version of Pycharm and for some reason, urllib no longer has the parse method and there is no longer a class called html.parser. Does it have something to do with the python version? The version I have running on Pycharm is Python 2.7.8.

tehlolzfactor
Автор

how do you make pycharm look like that? when I change to Darcula theme the top bar and toolbar stay with the default theme and it looks horrible

kennyPAGC
Автор

this really doesnt show any output for me

pravalikabasam
Автор

Hey, If anyone is using Atom and getting an import error check out this link in order to run python3 instead of 2.7

Jakesmithfilms