email-scraper with python
hey guys hacktadine is here this is a small project called email-scraping first you get a question that
what is an email-scraper
An email parser or scraper is a program designed to extract email addresses from web pages. Such programs can usually extract email addresses from web pages and upload the results to a necessary file format, for example, Excel. For collecting email addresses from the web, professional scrapers usually parse data from social networks (LinkedIn, Facebook, etc.) or forums. If a company needs to find the email addresses of legal entities, it collects the required information from these firms' corporate sites
%2FGettyImages-92883815-58b864915f9b58af5c046ed3.jpg&f=1&nofb=1)
- The program searches for and selects websites according to various parameters: subject (keywords), date of publication, location, and other criteria (you can configure their list manually).
- Your web scraper searches for any lines containing "@" and "email" on the selected sites.
- The application adds the matching objects to your database.
the source code of basic email scraper
from bs4 import BeautifulSoup
import requests
import requests.exceptions
import urllib.parse
from collections import deque
import re
user_url = str(input('[+] Enter Target URL To Scan: '))
urls = deque([user_url])
scraped_urls = set()
emails = set()
count = 0
try:
while len(urls):
count += 1
if count == 100:
break
url = urls.popleft()
scraped_urls.add(url)
parts = urllib.parse.urlsplit(url)
base_url = '{0.scheme}://{0.netloc}'.format(parts)
path = url[:url.rfind('/')+1] if '/' in parts.path else url
print('[%d] Processing %s' % (count, url))
try:
response = requests.get(url)
except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
continue
new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I))
emails.update(new_emails)
soup = BeautifulSoup(response.text, features="lxml")
for anchor in soup.find_all("a"):
link = anchor.attrs['href'] if 'href' in anchor.attrs else ''
if link.startswith('/'):
link = base_url + link
elif not link.startswith('http'):
link = path + link
if not link in urls and not link in scraped_urls:
urls.append(link)
except KeyboardInterrupt:
print('[-] Closing!')
for mail in emails:
print(mail)
Why Do You Need an Email Scraper?
- collect an extensive database of email addresses;
- reduce the time spent on finding clients;
- automate the process of an email marketing campaign;
- track the history of actions performed.
we will see you in the next post
Be Secure, Respect Privacy


Comments
Post a Comment