Synchronize your medium blog with python

Screenshot from my homepage, antonioblago.com.
from bs4 import BeautifulSoup ## for web scraping
import urllib.request
import re ## regex
from linkpreview import link_preview
import pandas as pd
url = <your medium blog>
req = urllib.request.Request(url, headers = {'User-agent': 'your bot 0.1'})
response = urllib.request.urlopen(req)
html = response.read()
# Parsing response
soup = BeautifulSoup(html, 'html.parser')
# Find all a classes
text = soup.find_all("a")
Screenshot by author
list_urls = []
for item in text:
## convert it to strings
item = str(item)
pos = item.find('href="/')
if pos is not None: ## if you find a link
if pos > 0 and item.find("user_profile")>0:
print(item)
Screenshot by author
result = re.search(r'href="\/+([a-z])\w+', item)
if result:
print(item)

Exkurs regex (regular expressions)

Screenshot by author
Screenshot by author

Lets continue

result = re.search(r'href="\/+([a-z])\w+', item)
if result:
try:
start = result.start()## Start pattern
end_item = item[start:]
end = re.search(r'-"', end_item).end() ## End of pattern

url_extract_pos = item[start:start+end] ## cut out url
url = "http://antonioblago.medium.com"+url_extract_pos[6:-1]

if url not in list_urls:
list_urls.append(url)
print(url)
Screenshot by author
list_of_links = []
for i in list_urls:
preview = link_preview(i)

dic_preview = {"title": preview.title,
"description": preview.description,
"image": preview.image,
"force_title": preview.force_title,
"absolute_image": preview.absolute_image,
"url": i}
list_of_links.append(dic_preview)

--

--

--

I am a data analyst discovering the unlimited world of coding and data.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Complex JSON parsing with Ansible and JMESPath

MONSTERRA X AZER: MASSIVE AIRDROP

Why are we not using one type of Algorithm to solve problem?

From Hardware to Software

Application Security foundation — LDAP and SAML

Build a C# application based on Deezer Native SDK with Unity

AppSeed — An intro, a 2019 status, and further steps

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Antonio Blago

Antonio Blago

I am a data analyst discovering the unlimited world of coding and data.

More from Medium

Web Scrape DJs to Track Concerts Python S

Which Are the Top 5 Python Libraries Used for Web Scraping?

How to scrape Google search results using Python for Beginners

30 Days of Streamlit