Prerequisites
To follow the steps on this page:- Create a target with the Real-time analytics capability enabled. You need your connection details. This procedure also works for .
- Install Python 3
- Sign up for Twelve Data. The free tier is perfect for this tutorial.
- Made a note of your Twelve Data API key.
Connect to the websocket server
When you connect to the Twelve Data API through a websocket, you create a persistent connection between your computer and the websocket server. You set up a Python environment, and pass two arguments to create a websocket object and establish the connection.Set up a new Python environment
Create a new Python virtual environment for this project and activate it. All the packages you need to complete for this tutorial are installed in this environment.-
Create and activate a Python virtual environment:
-
Install the Twelve Data Python
wrapper library
with websocket support. This library allows you to make requests to the
API and maintain a stable websocket connection.
-
Install Psycopg2 so that you can connect the
from your Python script:
Create the websocket connection
A persistent connection between your computer and the websocket server is used to receive data for as long as the connection is maintained. You need to pass two arguments to create a websocket object and establish connection. Websocket arguments-
on_eventThis argument needs to be a function that is invoked whenever there’s a new data record is received from the websocket:This is where you want to implement the ingestion logic so whenever there’s new data available you insert it into the database. -
symbolsThis argument needs to be a list of stock ticker symbols (for example,MSFT) or crypto trading pairs (for example,BTC/USD). When using a websocket connection you always need to subscribe to the events you want to receive. You can do this by using thesymbolsargument or if your connection is already created you can also use thesubscribe()function to get data for additional symbols.
-
Create a new Python file called
websocket_test.pyand connect to the Twelve Data servers using the<YOUR_API_KEY>: -
Run the Python script:
-
When you run the script, you receive a response from the server about the
status of your connection:
When you have established a connection to the websocket server, wait a few seconds, and you can see data records, like this:Each price event gives you multiple data points about the given trading pair such as the name of the exchange, and the current price. You can also occasionally see
heartbeatevents in the response; these events signal the health of the connection over time. At this point the websocket connection is working successfully to pass data.
The real-time dataset
To ingest the data into your , you need to implement theon_event function.
After the websocket connection is set up, you can use the on_event function
to ingest data into the database. This is a data pipeline that ingests real-time
financial data into your .
Stock trades are ingested in real-time Monday through Friday, typically during
normal trading hours of the New York Stock Exchange (9:30 AM to
4:00 PM EST).
Optimize time-series data in hypertables
s are tables in that automatically partition your time-series data by time. Time-series data represents the way a system, process, or behavior changes over time. s enable to work efficiently with time-series data. Each is made up of child tables called chunks. Each chunk is assigned a range of time, and only contains data from that range. When you run a query, identifies the correct chunk and runs the query on it, instead of going through the entire table. is the hybrid row-columnar storage engine in used by . Traditional databases force a trade-off between fast inserts (row-based storage) and efficient analytics (columnar storage). eliminates this trade-off, allowing real-time analytics without sacrificing transactional capabilities. dynamically stores data in the most efficient format for its lifecycle:- Row-based storage for recent data: the most recent chunk (and possibly more) is always stored in the , ensuring fast inserts, updates, and low-latency single record queries. Additionally, row-based storage is used as a writethrough for inserts and updates to columnar storage.
- Columnar storage for analytical performance: chunks are automatically compressed into the , optimizing storage efficiency and accelerating analytical queries.
- Connect to your In open an SQL editor. You can also connect to your service using psql.
-
Create a to store the real-time stock data
If you are self-hosting v2.19.3 and below, create a relational table, then convert it using create_hypertable. You then enable with a call to ALTER TABLE.
-
Create an index to support efficient queries
Index on the
symbolandtimecolumns:
Create standard Postgres tables for relational data
When you have other relational data that enhances your time-series data, you can create standard tables just as you would normally. For this dataset, there is one other table of data calledcompany.
-
Add a table to store the company data
stocks_real_time, and one regular table named company.
When you ingest data into a transactional database like , it is more
efficient to insert data in batches rather than inserting data row-by-row. Using
one transaction to insert multiple rows can significantly increase the overall
ingest capacity and speed of your .
Batching in memory
A common practice to implement batching is to store new records in memory first, then after the batch reaches a certain size, insert all the records from memory into the database in one transaction. The perfect batch size isn’t universal, but you can experiment with different batch sizes (for example, 100, 1000, 10000, and so on) and see which one fits your use case better. Using batching is a fairly common pattern when ingesting data into from Kafka, Kinesis, or websocket connections. You can implement a batching solution in Python with Psycopg2. You can implement the ingestion logic within theon_event function that
you can then pass over to the websocket object.
This function needs to:
- Check if the item is a data item, and not websocket metadata.
- Adjust the data so that it fits the database schema, including the data types, and order of columns.
- Add it to the in-memory batch, which is a list in Python.
- If the batch reaches a certain size, insert the data, and reset or empty the list.
Ingesting data in real-time
-
Update the Python script that prints out the current batch size, so you can
follow when data gets ingested from memory into your database. Use
the
<HOST>,<PASSWORD>, and<PORT>details for the where you want to ingest the data and your API key from Twelve Data: -
Run the script:
Query the data
To look at OHLCV values, the most effective way is to create a . You can create a to aggregate data for each hour, then set the aggregate to refresh every hour, and aggregate the last two hours’ worth of data.Creating a continuous aggregate
-
Connect to the
tsdbthat contains the Twelve Data stocks dataset. -
At the psql prompt, create the to aggregate data every
minute:
When you create the , it refreshes by default.
-
Set a refresh policy to update the every hour,
if there is new data available in the for the last two hours:
Query the continuous aggregate
When you have your set up, you can query it to get the OHLCV values.- Connect to the that contains the Twelve Data stocks dataset.
-
At the psql prompt, use this query to select all
AAPLOHLCV data for the past 5 hours, by time bucket:The result of the query looks like this:
Visualize the OHLCV data in Grafana
You can visualize the OHLCV data that you created using the queries in Grafana.Graph OHLCV data
When you have extracted the raw OHLCV data, you can use it to graph the result in a candlestick chart, using Grafana. To do this, you need to have Grafana set up to connect to your instance.- Ensure you have Grafana installed, and you are using the TimescaleDB database that contains the Twelve Data dataset set up as a data source.
-
In Grafana, from the
Dashboardsmenu, clickNew Dashboard. In theNew Dashboardpage, clickAdd a new panel. -
In the
Visualizationsmenu in the top right corner, selectCandlestickfrom the list. Ensure you have set the Twelve Data dataset as your data source. -
Click
Edit SQLand paste in the query you used to get the OHLCV values. -
In the
Format assection, selectTable. -
Adjust elements of the table as required, and click
Applyto save your graph to the dashboard.