Query the Bitcoin blockchain

The financial industry is extremely data-heavy and relies on real-time and historical data for decision-making, risk assessment, fraud detection, and market analysis. simplifies management of these large volumes of data, while also providing you with meaningful analytical insights and optimizing storage costs. In this tutorial, you use to ingest, store, and analyze transactions on the Bitcoin blockchain. Blockchains are, at their essence, a distributed database. The transactions in a blockchain are an example of time-series data. You can use to query transactions on a blockchain, in exactly the same way as you might query time-series transactions in any other database. This tutorial covers:

Ingest data into a : set up and connect to a , create tables and s, and ingest data.
Query your data: obtain information, including finding the most recent transactions on the blockchain, and gathering information about the transactions using aggregation functions.
Compress your data using : compress data that is no longer needed for highest performance queries, but is still accessed regularly for real-time analytics.

When you’ve completed this tutorial, you can use the same dataset to Analyze the Bitcoin blockchain, using hyperfunctions.

Prerequisites

To follow the steps on this page:

Create a target with the Real-time analytics capability enabled.
You need your connection details. This procedure also works for .

Ingest data into a service

This tutorial uses a dataset that contains Bitcoin blockchain data for the past five days, in a named transactions.

Optimize time-series data using hypertables

s are tables in that automatically partition your time-series data by time. Time-series data represents the way a system, process, or behavior changes over time. s enable to work efficiently with time-series data. Each is made up of child tables called chunks. Each chunk is assigned a range of time, and only contains data from that range. When you run a query, identifies the correct chunk and runs the query on it, instead of going through the entire table. is the hybrid row-columnar storage engine in used by . Traditional databases force a trade-off between fast inserts (row-based storage) and efficient analytics (columnar storage). eliminates this trade-off, allowing real-time analytics without sacrificing transactional capabilities. dynamically stores data in the most efficient format for its lifecycle:

Row-based storage for recent data: the most recent chunk (and possibly more) is always stored in the , ensuring fast inserts, updates, and low-latency single record queries. Additionally, row-based storage is used as a writethrough for inserts and updates to columnar storage.
Columnar storage for analytical performance: chunks are automatically compressed into the , optimizing storage efficiency and accelerating analytical queries.

Unlike traditional columnar databases, allows data to be inserted or modified at any stage, making it a flexible solution for both high-ingest transactional workloads and real-time analytics—within a single database. Because is 100% , you can use all the standard tables, indexes, stored procedures, and other objects alongside your s. This makes creating and working with s similar to standard .

Connect to your In open an SQL editor. The in-Console editors display the query speed. You can also connect to your service using psql.

Create a for your time-series data using CREATE TABLE. For efficient queries on data in the , remember to segmentby the column you will use most often to filter your data:

CREATE TABLE transactions (
   time TIMESTAMPTZ NOT NULL,
   block_id INT,
   hash TEXT,
   size INT,
   weight INT,
   is_coinbase BOOLEAN,
   output_total BIGINT,
   output_total_usd DOUBLE PRECISION,
   fee BIGINT,
   fee_usd DOUBLE PRECISION,
   details JSONB
) WITH (
   tsdb.hypertable,
   tsdb.partition_column='time',
   tsdb.segmentby='block_id',
   tsdb.orderby='time DESC'
);

If you are self-hosting v2.19.3 and below, create a relational table, then convert it using create_hypertable. You then enable with a call to ALTER TABLE.

Create an index on the hash column to make queries for individual transactions faster:
```
CREATE INDEX hash_idx ON public.transactions USING HASH (hash);
```
Create an index on the block_id column to make block-level queries faster: When you create a , it is partitioned on the time column. automatically creates an index on the time column. However, you’ll often filter your time-series data on other columns as well. You use indexes to improve query performance.
```
CREATE INDEX block_idx ON public.transactions (block_id);
```
Create a unique index on the time and hash columns to make sure you don’t accidentally insert duplicate records:
```
CREATE UNIQUE INDEX time_hash_idx ON public.transactions (time, hash);
```

Load financial data

The dataset contains around 1.5 million Bitcoin transactions, the trades for five days. It includes information about each transaction, along with the value in satoshi. It also states if a trade is a coinbase transaction, and the reward a coin miner receives for mining the coin. To ingest data into the tables that you created, you need to download the dataset and copy the data to your database.

Download the bitcoin_sample.zip file. The file contains a .csv file that contains Bitcoin transactions for the past five days: bitcoin_sample.zip
In a new terminal window, run this command to unzip the .csv files:
```
unzip bitcoin_sample.zip
```
In Terminal, navigate to the folder where you unzipped the Bitcoin transactions, then connect to your using psql.
At the psql prompt, use the COPY command to transfer data into your . If the .csv files aren’t in your current directory, specify the file paths in these commands:
```
\COPY transactions FROM 'tutorial_bitcoin_sample.csv' CSV HEADER;
```
Because there is over a million rows of data, the COPY process could take a few minutes depending on your internet connection and local client resources.

Query the data

When you have your dataset loaded, you can start constructing some queries to discover what your data tells you. In this section, you learn how to write queries that answer these questions:

What are the five most recent coinbase transactions?
What are the five most recent transactions?
What are the five most recent blocks?

What are the five most recent coinbase transactions?

In the last procedure, you excluded coinbase transactions from the results. Coinbase transactions are the first transaction in a block, and they include the reward a coin miner receives for mining the coin. To find out the most recent coinbase transactions, you can use a similar SELECT statement, but search for transactions that are coinbase instead. If you include the transaction value in US Dollars again, you’ll notice that the value is $0 for each. This is because the coin has not transferred ownership in coinbase transactions. Finding the five most recent coinbase transactions

Connect to the that contains the Bitcoin dataset.

At the psql prompt, use this query to select the five most recent coinbase transactions:

SELECT time, hash, block_id, fee_usd  FROM transactions
WHERE is_coinbase IS TRUE
ORDER BY time DESC
LIMIT 5;

The data you get back looks a bit like this:

         time          |                               hash                               | block_id | fee_usd
------------------------+------------------------------------------------------------------+----------+---------
 2023-06-12 23:54:18+00 | 22e4610bc12d482bc49b7a1c5b27ad18df1a6f34256c16ee7e499b511e02d71e |   794111 |       0
 2023-06-12 23:53:08+00 | dde958bb96a302fd956ced32d7b98dd9860ff82d569163968ecfe29de457fedb |   794110 |       0
 2023-06-12 23:44:50+00 | 75ac1fa7febe1233ee57ca11180124c5ceb61b230cdbcbcba99aecc6a3e2a868 |   794109 |       0
 2023-06-12 23:44:14+00 | 1e941d66b92bf0384514ecb83231854246a94c86ff26270fbdd9bc396dbcdb7b |   794108 |       0
 2023-06-12 23:41:08+00 | 60ae50447254d5f4561e1c297ee8171bb999b6310d519a0d228786b36c9ffacf |   794107 |       0
(5 rows)

What are the five most recent transactions?

This dataset contains Bitcoin transactions for the last five days. To find out the most recent transactions in the dataset, you can use a SELECT statement. In this case, you want to find transactions that are not coinbase transactions, sort them by time in descending order, and take the top five results. You also want to see the block ID, and the value of the transaction in US Dollars. Finding the five most recent transactions

Connect to the that contains the Bitcoin dataset.

At the psql prompt, use this query to select the five most recent non-coinbase transactions:

SELECT time, hash, block_id, fee_usd  FROM transactions
WHERE is_coinbase IS NOT TRUE
ORDER BY time DESC
LIMIT 5;

The data you get back looks a bit like this:

          time          |                               hash                               | block_id | fee_usd
------------------------+------------------------------------------------------------------+----------+---------
 2023-06-12 23:54:18+00 | 6f709d52e9aa7b2569a7f8c40e7686026ede6190d0532220a73fdac09deff973 |   794111 |   7.614
 2023-06-12 23:54:18+00 | ece5429f4a76b1603aecbee31bf3d05f74142a260e4023316250849fe49115ae |   794111 |   9.306
 2023-06-12 23:54:18+00 | 54a196398880a7e2e38312d4285fa66b9c7129f7d14dc68c715d783322544942 |   794111 | 13.1928
 2023-06-12 23:54:18+00 | 3e83e68735af556d9385427183e8160516fafe2f30f30405711c4d64bf0778a6 |   794111 |  3.5416
 2023-06-12 23:54:18+00 | ca20d073b1082d7700b3706fe2c20bc488d2fc4a9bb006eb4449efe3c3fc6b2b |   794111 |  8.6842
(5 rows)

What are the five most recent blocks?

In this procedure, you use a more complicated query to return the five most recent blocks, and show some additional information about each, including the block weight, number of transactions in each block, and the total block value in US Dollars. Finding the five most recent blocks

Connect to the that contains the Bitcoin dataset.

At the psql prompt, use this query to select the five most recent coinbase transactions:

WITH recent_blocks AS (
 SELECT block_id FROM transactions
 WHERE is_coinbase IS TRUE
 ORDER BY time DESC
 LIMIT 5
)
SELECT
 t.block_id, count(*) AS transaction_count,
 SUM(weight) AS block_weight,
 SUM(output_total_usd) AS block_value_usd
FROM transactions t
INNER JOIN recent_blocks b ON b.block_id = t.block_id
WHERE is_coinbase IS NOT TRUE
GROUP BY t.block_id;

The data you get back looks a bit like this:

 block_id | transaction_count | block_weight |  block_value_usd
----------+-------------------+--------------+--------------------
   794108 |              5625 |      3991408 |  65222453.36381342
   794111 |              5039 |      3991748 |  5966031.481099684
   794109 |              6325 |      3991923 |  5406755.801599815
   794110 |              2525 |      3995553 |  177249139.6457974
   794107 |              4464 |      3991838 | 107348519.36559173
(5 rows)

Compress your data using hypercore

is the hybrid row-columnar storage engine in used by . Traditional databases force a trade-off between fast inserts (row-based storage) and efficient analytics (columnar storage). eliminates this trade-off, allowing real-time analytics without sacrificing transactional capabilities. dynamically stores data in the most efficient format for its lifecycle:

Row-based storage for recent data: the most recent chunk (and possibly more) is always stored in the , ensuring fast inserts, updates, and low-latency single record queries. Additionally, row-based storage is used as a writethrough for inserts and updates to columnar storage.
Columnar storage for analytical performance: chunks are automatically compressed into the , optimizing storage efficiency and accelerating analytical queries.

Optimize your data in the columnstore

To compress the data in the transactions table, do the following:

Connect to your In open an SQL editor. The in-Console editors display the query speed. You can also connect to your using psql.

Convert data to the : You can do this either automatically or manually:

Automatically convert chunks in the to the at a specific time interval:
```
CALL add_columnstore_policy('transactions', after => INTERVAL '1d');
```

Manually convert all chunks in the to the :

DO $$
DECLARE
   chunk_name TEXT;
BEGIN
   FOR chunk_name IN (SELECT c FROM show_chunks('transactions') c)
   LOOP
      RAISE NOTICE 'Converting chunk: %', chunk_name; -- Optional: To see progress
      CALL convert_to_columnstore(chunk_name);
   END LOOP;
   RAISE NOTICE 'Conversion to columnar storage complete for all chunks.'; -- Optional: Completion message
END$$;

Take advantage of query speedups

Previously, data in the was segmented by the block_id column value. This means fetching data by filtering or grouping on that column is more efficient. Ordering is set to time descending. This means that when you run queries which try to order data in the same way, you see performance benefits.

Connect to your In open an SQL editor. The in-Console editors display the query speed.

Run the following query:

WITH recent_blocks AS (
 SELECT block_id FROM transactions
 WHERE is_coinbase IS TRUE
 ORDER BY time DESC
 LIMIT 5
)
SELECT
 t.block_id, count(*) AS transaction_count,
 SUM(weight) AS block_weight,
 SUM(output_total_usd) AS block_value_usd
FROM transactions t
INNER JOIN recent_blocks b ON b.block_id = t.block_id
WHERE is_coinbase IS NOT TRUE
GROUP BY t.block_id;

Performance speedup is of two orders of magnitude, around 15 ms when compressed in the and 1 second when decompressed in the .

Time-series

AI

​Prerequisites

​Ingest data into a service

​Optimize time-series data using hypertables

​Load financial data

​Query the data

​What are the five most recent coinbase transactions?

​What are the five most recent transactions?

​What are the five most recent blocks?

​Compress your data using hypercore

​Optimize your data in the columnstore

​Take advantage of query speedups

Prerequisites

Ingest data into a service

Optimize time-series data using hypertables

Load financial data

Query the data

What are the five most recent coinbase transactions?

What are the five most recent transactions?

What are the five most recent blocks?

Compress your data using hypercore

Optimize your data in the columnstore

Take advantage of query speedups