Python Web Scraping for Beginners with Beautiful Soup - Functionalities for Scraping Project - FINAL

preview_player
Показать описание
Python Web Scraping
#Python
IMPORTANT POINT TO READ BEFORE THE VIDEO:

👍Welcome to my Python Web Scraping Tutorial for Beginners!
In this series, you will learn how to pull any information you want from any website!

On this one, we will turn the program from the previous episode to a one that could be useful, by prettifying it and make that more dynamic.
The functionalities we will add:
- Show a nicer output of the results
- Option to filter out jobs that do not meet some skills that are required.
- Scraping the website every n minutes
- Writing the results to separated text files.

🔥 19th October - First Part

🔥 22nd October - Second Part

🔥 25th October - Third Part

Connect with me with:

👍 Subscribe for more Python tutorials like this:
🔥 Comment below other topics you want to see tutorials next on my Channel.
-----------------------------------------------

My website:
Рекомендации по теме
Комментарии
Автор

The exact timeline will be edited once the Premier begins, the topics will be:
00:00 - 05:49 - Aligning / Organizing the pulled info
05:50 - 09:30 - Filter Unfamiliar skills from the job posts
09:31 - 13:35 - Scrape the website every 10 minutes

13:36 - 20:12 - Writing the job posts to text files

Enjoy the last episode for Python Web Scraping!

jimshapedcoding
Автор

Thank you. You have changed my perspective towards python made me very skilled python developer. I watched all your videos and they helped me a lot. Thank You Very Much

spyrush
Автор

Hi! Love your work. Just completed the scraping series. Looking forward to more uploads. Thanks a lot.

sak
Автор

hello, I am from freecodecamp, I haven't watched it there but come here to watch just to support you. Thank you a lot

tnthtina
Автор

Thanks a lot, Jim for this great tutorial! Your video helped me a lot to figure out how to scrape saved multiple .html files in my folder and create .txt files for each HTML file. Thank you!

muharrembgrynk
Автор

Great series, looking forward to more.

Hexbyte
Автор

I had a lot of fun with this tutorial, and I hope the next episode will integrate BeautifulSoup with Selenium. Thanks a lot for you 🌹🌹🌹🌹

AbuAhmedAlsudani
Автор

U are great sir, plz do more about web scraping. Thanks

OyeroIbrahim
Автор

Hello sir,
your scraping series, I just love it. 😍😍😍
hope you come back to get some more videos. scrap file saved to(CSV/Excel)

worldlive
Автор

Hi Jim, Can you add a video about inverted indexing and about the indexing in detail.

naveenav
Автор

This is a great tutorial. Thanks for that! Please, also create a video, about how to scrape data from multiple pages with pagination. Also how to store the information in dataframe.

giorgisabadze
Автор

Best in the league = "JimShapedCoding"

hell.yeahhh
Автор

Great serie!... I have a question... how can I scape a website that requires login first with Python? I am new with python

aluisjara
Автор

I like your tutorial. Best teaching.
I want to upgrade pip. Help me

KM-jmtc
Автор

May I ask how can you change this program such that it can find jobs from other pages(not just the first page)?

ksh
Автор

I came from FreeCodeCamp and i liked ur tutorials so much and thank u for these tutorials but can u suggest me other free advanced tutorials about web scraping ? even if was just documents ?

mazen-r
Автор

Hi. I finished your selenium series. I loved it and love all the amazing content you have. I took it upon myself to attempt the challenge in this video. I used Counter from collections then filtered out the multiple unfamiliar skills using difference. (sorted skills is after taking out the unfamiliar skills) So, if Counter(sorted_skills) == Counter (skills) (skills is skills required for the job) then i write to the file. My code is working fine. The problem is the text files don't change/update after running one time. Copying the code onto another pycharm project allows me to run the code one more time. Please help me, I already spent 9 hours working on this and another 3 hours searching on stack overflow.
It would even be useful to post the solution to your github account. Thanks in advance.
import time
from bs4 import BeautifulSoup
import requests
from collections import Counter

print("What skills would you like to filter out: \n")
unfamiliar_skill = input(">").lower().replace(", ", '').split()
unfamiliar_skill_set = set(unfamiliar_skill)
for i in unfamiliar_skill_set:
print(f'Filtering out {i} as a skill.')


def find_jobs():


soup = BeautifulSoup(html_text, 'lxml')
jobs = soup.find_all('li', class_="clearfix job-bx wht-shd-bx")
for index, job in enumerate(jobs):
published_date = soup.find('span',
if 'few' in published_date:
company_name = job.find('h3', ", '')
skill_to_pass = job.find('span', ", '')
skill = job.find('span', class_='srp-skills').text.replace(', ', '').split()
skill_set = set(skill)

sorted_skills =

job_description = job.find('ul', class_="list-job-dtl clearfix").li.text
more_info = job.header.h2.a['href']
if Counter(sorted_skills) == Counter(skill_set):
with open(f'posts/{index}.txt', 'w') as f:

f.write(f'Company Name: {company_name.strip()}\n')

f.write(f'Required Skills: {skill_to_pass}\n')
f.write(f'More Info: {more_info}')

print(f'File: {index}')


if __name__ == '__main__':
while True:
find_jobs()
time_wait = 10
print(f'Waiting {time_wait} minutes...')
time.sleep(time_wait * 60)

iramkhan
Автор

Shalloum bruh, Can you do some machine learning tutorials in the future.

Thanks

surkewrasoul
Автор

I have found you first from freecodecamp.org, and now I like your tutorials more than

blessmusic
Автор

How to scrap links that are only in Network tab Fetch/XHR alone!? That is not listed in the page source!

booiggers