about us

Welcome Everyone! Hi guys hacktadine is here. I want to personally welcome everyone to the site. i am the founder of APRESEC (A PREsistent SECurity). i like to play with tech and i feel myself as an techfreek. if u visit this blog it means you are buddy. so feel free to join our community

email-scraper with python

hey guys hacktadine is here this is a small project called email-scraping first you get a question that

what is an email-scraper

An email parser or scraper is a program designed to extract email addresses from web pages. Such programs can usually extract email addresses from web pages and upload the results to a necessary file format, for example, Excel. For collecting email addresses from the web, professional scrapers usually parse data from social networks (LinkedIn, Facebook, etc.) or forums. If a company needs to find the email addresses of legal entities, it collects the required information from these firms' corporate sites

 Send Email From a PHP Script Using SMTP Authentication

Web scrapers let you automate the process of collecting the data. Their main advantage is that they do it incredibly quickly. One can find a hundred addresses in a couple of minutes. What's more, the program can save information, process it, and provide it in a graphical form. In brief, email scraping consists of the following steps:
  1. The program searches for and selects websites according to various parameters: subject (keywords), date of publication, location, and other criteria (you can configure their list manually).
  2. Your web scraper searches for any lines containing "@" and "email" on the selected sites. 
  3. The application adds the matching objects to your database.

 

 

 the source code of basic email scraper

from bs4 import BeautifulSoup
import requests
import requests.exceptions
import urllib.parse
from collections import deque
import re

user_url = str(input('[+] Enter Target URL To Scan: '))
urls = deque([user_url])

scraped_urls = set()
emails = set()

count = 0
try:
    while len(urls):
        count += 1
        if count == 100:
            break
        url = urls.popleft()
        scraped_urls.add(url)

        parts = urllib.parse.urlsplit(url)
        base_url = '{0.scheme}://{0.netloc}'.format(parts)

        path = url[:url.rfind('/')+1] if '/' in parts.path else url

        print('[%d] Processing %s' % (count, url))
        try:
            response = requests.get(url)
        except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
            continue

        new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I))
        emails.update(new_emails)

        soup = BeautifulSoup(response.text, features="lxml")

        for anchor in soup.find_all("a"):
            link = anchor.attrs['href'] if 'href' in anchor.attrs else ''
            if link.startswith('/'):
                link = base_url + link
            elif not link.startswith('http'):
                link = path + link
            if not link in urls and not link in scraped_urls:
                urls.append(link)
except KeyboardInterrupt:
    print('[-] Closing!')

for mail in emails:
    print(mail)

 

Why Do You Need an Email Scraper? 

 

  • collect an extensive database of email addresses;
  • reduce the time spent on finding clients;
  • automate the process of an email marketing campaign;
  • track the history of actions performed. 

 

                                                                               we will see you in the next post

                                                                                Be Secure, Respect Privacy

 

Comments

another post