Python Scrapy Tutorial - 19 - Web Crawling & Following links

preview_player
Показать описание
In this web crawling video we will learn how to follow links given on any webpage and also how to scrape multiple pages using Scrapy Python.

Next video - Scraping websites with Pagination

#python
Рекомендации по теме
Комментарии
Автор

You saved my ass in the office bro. Best tutorial playlist i've ever come across.
May you get all the happiness in life and thereafter.

saifkhanali
Автор

That is so cool! I'm blown away by how easy it was to scrap 10 pages. Thanks a lot for this playlist!!

abubakrbardien
Автор

THANK YOU! Thank's to you I just made my first web crawler(s). It was my little dream tbh, because I've tried it in the past couple of times without success. Thanks to this series I was finally able to make crawlers for multiple online shops, and they're pretty advanced, as I upgraded them to not only be able to paginate through multiple follow links, but also categories of products. This funcionality I was missing in crawlers available on the internet. I even made crawlers to find and scrape links of categories in shops to be later used as start_urls. Thank you again, like and sub clicked.

niktniewiem
Автор

I Must admit These Python Scrapy tutorials... are very user-friendly for a beginner...took me few hours to learn...So well explained!!!... Thank you for all the Efforts

kewalkkarki
Автор

Hello buildwithpython!! very very useful video series!!! I learned a lot being totally new to scraping!! hats off!!!!

drvlog
Автор

importa explain that the .get() return NONE if don't find any value, additionally return the first element of the list.
Many thanks for your teachings

whayAl
Автор

response.follow doesn't seem working for me, still I am getting the First page content in the output.

kguru
Автор

whats an attribute, why cant you get the link from using a::href. kind of jumped from somewhere on this part . would appreciate if someone throws light on this attribute called as "attribute " and what are the other properties of different elements.

inifin
Автор

@buildwithpython, if a page is clicked and the browser link is not changing, how to handle this condition ? any help is appreciated ..

ashirbad
Автор

Hey mate, these tutorials have been really helpful for me learning Python and scraping. I appreciate you putting the time in to creating them.

With this particular video, I have followed along the series up till here. When I run my scraper now at this point I am getting an error. When an item does not have valid tags it will fail to load into the DB.

An example is on page 8, the author Ayn Rand. There are no tags on that quote and this causes the entry to not get inserted into a DB.

Is there another video where you address these conditional style of issues that shows how to check for these cases and perhaps insert null values or something along those lines. I'm very new to Python and I'm happy to Google the answer but I thought maybe you've covered something like this elsewhere.

Thanks.

imyourjosh
Автор

Hey I was trying to extract the links of the Google search results but it seems I can't extract the href of the Google search results. Can anyone help me on this?

sedharth
Автор

I have some problems to extract data from a first page (books) and after to follow a link (author) and to extract from there other data.
All of these extracted data must be saved per item (book), as each book has a author. All saved in a json and from there i will save in a PostgreSQL or other db.
Can I do it with only one spider?

Other question: Can i have parse and parse_author to write on the item?

[
{
"name": "A Game of Thrones",
"description": "...",
"price": "$10.99",
"author": [
{
"name": "George R. R. Martin",
"rating": "4.5",
  "description": "..."
},
]
}
]

adrianhelerea
Автор

hi, thanks for the video i have a question i tried to click following links with scrapy but i could 't all link' s href='#'

zaferbagdu
Автор

How do I get the href of a webelement?

SoulfrikRule
Автор

can anyone please tell me how to go inside a link, (as we do by clicking on the add/item in a website and looking for more details) and scrape the details in that particular page and come back to the previous page and repeat it for the next items in the list?

drvlog
Автор

Hey, I have a doubt is there a way in which i can add additional data (from a link)into the same csv file, as in say i want to add the bio and date of birth of each author along with the name, tags and quote from the main page.

allenalex
Автор

To get the next page instead of scrapy.follow i use "yield scrapy.Request(absolute_next_page_url, callback=self.parse)".


What's the difference?

Skaxarrat
Автор

How to crawl all links in a website? Be it buttons, lists, menu etc

mansukhkaur
Автор

Thanks for the great video, however, It seems Scrapy cannot be installed both using PyCharm and pip ? and CSS selector not available in the Chrome web store

property
Автор

Hey many thanks for a great tutorial.
Is there any possibility that you can assist me on a project which I am struggling a bit? I can pay for your time :)

leninespindola