Web Scraping for LLM in 2024: Jina AI Reader API, Mendable Firecrawl, and Crawl4AI and More

Prompt Engineering • June 4, 2024
Video Thumbnail
Prompt Engineering Logo

Prompt Engineering

@engineerprompt

About

👋 I'm Muhammad! I am an AI/ML Expert with a PhD and have led ML teams at startups over the last 8 years. I am also a Google Developer Expert for ML/AI. The goal of this channel is to share my learning without all the "fluff" and "hype". I offer consulting services to startups and https://calendly.com/engineerprompt/consulting-call Learn more about me @: engineerprompt.ai For business inquiries email: [email protected]

Video Description

In this video, we look into various tools for web scraping, both free and paid. Learn how to scrape data from web pages and PDFs using Beautiful Soup, Reader API from Jena AI, and Firecrawl from Mendable. We also discuss advanced web scraping solutions like Scrape Graph AI and Crawl4AI. Ideal for creating LLM applications, this video provides practical examples and code demonstrations. Subscribe for more tutorials on building LLM applications and tools! #webscraping #llm #parsing 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: [email protected] Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag LINKS: Notebook: https://tinyurl.com/5n8dcbj8 Reader API: https://jina.ai/reader/ FireCrawl: https://www.firecrawl.dev/ Crawl4AI: https://github.com/unclecode/crawl4ai ScrapeGraphAI: https://github.com/VinciGit00/Scrapegraph-ai TIMESTAMPS 00:00 Introduction to Data Scraping Series 00:21 Challenges of Web Data 01:32 Overview of Web Scraping Tools 01:59 Example Web Pages for Scraping 03:05 BeautifulSoup: The Baseline Approach 05:05 Reader API: JINA AI 08:21 FireCrawl: An Alternative Tool 10:42 Crawl4Ai and ScrapeGraphAI 12:13 Conclusion and Next Steps All Interesting Videos: Everything LangChain: https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr Everything LLM: https://youtube.com/playlist?list=PLVEEucA9MYhNF5-zeb4Iw2Nl1OKTH-Txw Everything Midjourney: https://youtube.com/playlist?list=PLVEEucA9MYhMdrdHZtFeEebl20LPkaSmw AI Image Generation: https://youtube.com/playlist?list=PLVEEucA9MYhPVgYazU5hx6emMXtargd4z