- Ingest data into a : set up and connect to a , create tables and s, and ingest data.
- Query your data: obtain information, including finding the most recent transactions on the blockchain, and gathering information about the transactions using aggregation functions.
- Compress your data using : compress data that is no longer needed for highest performance queries, but is still accessed regularly for real-time analytics.
Prerequisites
To follow the steps on this page:- Create a target with the Real-time analytics capability enabled. You need your connection details. This procedure also works for .
Ingest data into a service
This tutorial uses a dataset that contains Bitcoin blockchain data for the past five days, in a namedtransactions.
Optimize time-series data using hypertables
s are tables in that automatically partition your time-series data by time. Time-series data represents the way a system, process, or behavior changes over time. s enable to work efficiently with time-series data. Each is made up of child tables called chunks. Each chunk is assigned a range of time, and only contains data from that range. When you run a query, identifies the correct chunk and runs the query on it, instead of going through the entire table. is the hybrid row-columnar storage engine in used by . Traditional databases force a trade-off between fast inserts (row-based storage) and efficient analytics (columnar storage). eliminates this trade-off, allowing real-time analytics without sacrificing transactional capabilities. dynamically stores data in the most efficient format for its lifecycle:- Row-based storage for recent data: the most recent chunk (and possibly more) is always stored in the , ensuring fast inserts, updates, and low-latency single record queries. Additionally, row-based storage is used as a writethrough for inserts and updates to columnar storage.
- Columnar storage for analytical performance: chunks are automatically compressed into the , optimizing storage efficiency and accelerating analytical queries.
- Connect to your In open an SQL editor. The in-Console editors display the query speed. You can also connect to your service using psql.
-
Create a for your time-series data using CREATE TABLE.
For efficient queries on data in the , remember to
segmentbythe column you will use most often to filter your data:If you are self-hosting v2.19.3 and below, create a relational table, then convert it using create_hypertable. You then enable with a call to ALTER TABLE. -
Create an index on the
hashcolumn to make queries for individual transactions faster: -
Create an index on the
block_idcolumn to make block-level queries faster: When you create a , it is partitioned on the time column. automatically creates an index on the time column. However, you’ll often filter your time-series data on other columns as well. You use indexes to improve query performance. -
Create a unique index on the
timeandhashcolumns to make sure you don’t accidentally insert duplicate records:
Load financial data
The dataset contains around 1.5 million Bitcoin transactions, the trades for five days. It includes information about each transaction, along with the value in satoshi. It also states if a trade is a coinbase transaction, and the reward a coin miner receives for mining the coin. To ingest data into the tables that you created, you need to download the dataset and copy the data to your database.-
Download the
bitcoin_sample.zipfile. The file contains a.csvfile that contains Bitcoin transactions for the past five days: bitcoin_sample.zip -
In a new terminal window, run this command to unzip the
.csvfiles: - In Terminal, navigate to the folder where you unzipped the Bitcoin transactions, then connect to your using psql.
-
At the
psqlprompt, use theCOPYcommand to transfer data into your . If the.csvfiles aren’t in your current directory, specify the file paths in these commands:Because there is over a million rows of data, theCOPYprocess could take a few minutes depending on your internet connection and local client resources.
Query the data
When you have your dataset loaded, you can start constructing some queries to discover what your data tells you. In this section, you learn how to write queries that answer these questions:- What are the five most recent coinbase transactions?
- What are the five most recent transactions?
- What are the five most recent blocks?
What are the five most recent coinbase transactions?
In the last procedure, you excluded coinbase transactions from the results. Coinbase transactions are the first transaction in a block, and they include the reward a coin miner receives for mining the coin. To find out the most recent coinbase transactions, you can use a similarSELECT statement,
but search for transactions that are coinbase instead. If you include the
transaction value in US Dollars again, you’ll notice that the value is $0 for
each. This is because the coin has not transferred ownership in coinbase
transactions.
Finding the five most recent coinbase transactions
- Connect to the that contains the Bitcoin dataset.
-
At the psql prompt, use this query to select the five most recent
coinbase transactions:
-
The data you get back looks a bit like this:
What are the five most recent transactions?
This dataset contains Bitcoin transactions for the last five days. To find out the most recent transactions in the dataset, you can use aSELECT statement.
In this case, you want to find transactions that are not coinbase transactions,
sort them by time in descending order, and take the top five results. You also
want to see the block ID, and the value of the transaction in US Dollars.
Finding the five most recent transactions
- Connect to the that contains the Bitcoin dataset.
-
At the psql prompt, use this query to select the five most recent
non-coinbase transactions:
-
The data you get back looks a bit like this:
What are the five most recent blocks?
In this procedure, you use a more complicated query to return the five most recent blocks, and show some additional information about each, including the block weight, number of transactions in each block, and the total block value in US Dollars. Finding the five most recent blocks- Connect to the that contains the Bitcoin dataset.
-
At the psql prompt, use this query to select the five most recent
coinbase transactions:
-
The data you get back looks a bit like this:
Compress your data using hypercore
is the hybrid row-columnar storage engine in used by . Traditional databases force a trade-off between fast inserts (row-based storage) and efficient analytics (columnar storage). eliminates this trade-off, allowing real-time analytics without sacrificing transactional capabilities. dynamically stores data in the most efficient format for its lifecycle:- Row-based storage for recent data: the most recent chunk (and possibly more) is always stored in the , ensuring fast inserts, updates, and low-latency single record queries. Additionally, row-based storage is used as a writethrough for inserts and updates to columnar storage.
- Columnar storage for analytical performance: chunks are automatically compressed into the , optimizing storage efficiency and accelerating analytical queries.
Optimize your data in the columnstore
To compress the data in thetransactions table, do the following:
- Connect to your In open an SQL editor. The in-Console editors display the query speed. You can also connect to your using psql.
-
Convert data to the :
You can do this either automatically or manually:
-
Automatically convert chunks in the to the at a specific time interval:
-
Manually convert all chunks in the to the :
-
Automatically convert chunks in the to the at a specific time interval:
Take advantage of query speedups
Previously, data in the was segmented by theblock_id column value.
This means fetching data by filtering or grouping on that column is
more efficient. Ordering is set to time descending. This means that when you run queries
which try to order data in the same way, you see performance benefits.
- Connect to your In open an SQL editor. The in-Console editors display the query speed.
-
Run the following query:
Performance speedup is of two orders of magnitude, around 15 ms when compressed in the and 1 second when decompressed in the .