Scraping data with Scrapy and PostgreSQL and execute with scheduler

3 min readMar 13, 2021

Introduction

In the following tutorial, I will use Scrapy to retrieve the stock information from alphavantage in Python with a scheduler.

Prerequisite

The following software/ Technologies will be used.

Version listed below

$ python3 --version
Python 3.8.6$ pip3 freeze | grep APScheduler       
APScheduler==3.7.0$ pip3 freeze | grep psycopg2-binary
psycopg2-binary==2.8.6$ pip3 freeze | grep pandas         
pandas==1.2.3$ scrapy version  
Scrapy 2.4.1$ pip3 show sqlalchemy
Name: SQLAlchemy
Version: 1.3.23

Before you begin

Install Docker in your workstation from here.
Clone and navigate the github repo and run the commands below

# install the python packages
pip3 install -r requirements.txt# Start up local PostgreSQL server
docker run --name myPostgresSqlServer -p 5432:5432 -e POSTGRES_PASSWORD=weakpassword -d postgres

3. Go to Alphavantage to get your API key and paste it into the repo file create-ini.py .

Directory Explanation

.
├── README.md
├── docker-compose # Ignore if you are not using docker compose 
│   ├── docker-compose.yml
│   └── script.sh
├── scrapy
│   ├── config.ini # Paste your API key in this file
│   ├── myProject
│   │   ├── database.py
│   │   ├── pipeline.py
│   │   └── trim_data_gen_2.py
│   ├── requirements.txt
│   └── scheduler_gen_2.py # The main program entry
└── validate
    └── test.py

Run the program

Please remember to update the alphavantage API key in config.ini before you run the program.

Execute the following command to run the program

cd scrapy
python3 scheduler_gen_2.py

Validation

Execute the following command with pandas to check the result

$ python3 validate/test.py
    id transaction_date symbol      volume  open_value  high_value  low_value    price  dividend_amount
....
60  61       2009-09-03  WCIMQ  12962030.0      0.0290      0.0051     0.0160   0.0053              0.0
61  62       2009-08-31  WCIMQ  23353802.0      0.0550      0.0051     0.0053   0.0220              0.0
62  63       2009-07-31  WCIMQ   1171142.0      0.0390      0.0220     0.0220   0.0300              0.0
63  64       2009-06-30  WCIMQ   4127518.0      0.0575      0.0200     0.0375   0.0420              0.0
64  65       2009-05-29  WCIMQ   1824121.0      0.0700      0.0400     0.0420   0.0400              0.0
65  66       2009-04-30  WCIMQ   3051707.0      0.0500      0.0200     0.0400   0.0300              0.0
66  67       2009-03-31  WCIMQ   1959119.0      0.0700      0.0250     0.0275   0.0500              0.0
67  68       2009-02-27  WCIMQ    784978.0      0.0700      0.0400     0.0550   0.0600              0.0
68  69       2009-01-30  WCIMQ   2228852.0      0.0750      0.0390     0.0600   0.0390              0.0
69  70       2008-12-31  WCIMQ   6643793.0      0.1000      0.0220     0.0390   0.0750              0.0
70  71       2008-11-28  WCIMQ   4419475.0      0.1700      0.0500     0.0800   0.1300              0.0

The data is successfully inserted into PostgreSQL!