RTILA - Tutorial - How CHAT GPT makes Web Scrapping Easier - Generate Regex rules with Chat GPT

preview_player
Показать описание
This video tutorial shows how Chat GPT can help with finding the correct Regex rules and even JavaScript code needed to optimize and enhance the quality and precision of data scrapped from non-structured messy websites.

SUMMARY:
1 - Quick Definitions
2 - Benefits of using JS/REGEX rules in RTILA
3 - How to use CHAT GPT to find REGEX/JS Rules

What is REGEX: A function that isolates a text value based on it’s pattern.
What is the benefit: It allows you to cleanup a block of data and only save the exact text values your are targeting

For info & re-use:
REGEX Rule: Helps you isolate your target text. Result: Fin construction 1879
REGEX used: Fin construction\s*(\d{4})
JS Rule: Helps you further clean your data to only save what you want. Result: 1879
JS used: FIELD_VALUE=FIELD_VALUE.replace('Fin construction','');

Type a prompt like this:
JavaScript regex rule to isolate the 4 digits year that appears after Fin construction in the string below
Début construction 1864
Fin construction 1879

NB: Be aware & delete “\” or other characters added added by Chat GPT in front & end of the rule
Рекомендации по теме
Комментарии
Автор

Can RTILA able to scrape craigslist listing (Title and price )from specific category with specific term in automation way from all the US states and its parts?

Mayanksharma-bloggupdate