Slow queries with sqlite database generated by EF Core. Indices are present

638 views Asked by At

The schema

The sqlite database described below is currently 52 GB. It was generated as part of an Entity Framework Core 6.0 project.

The Trades table

CREATE TABLE IF NOT EXISTS "Trades" (
    "Id"        INTEGER NOT NULL CONSTRAINT "PK_Trades" PRIMARY KEY AUTOINCREMENT,
    "SymbolId"  INTEGER NOT NULL,
    "Price"     TEXT    NOT NULL,
    "Quantity"  TEXT    NOT NULL,
    "TimeStamp" TEXT    NOT NULL,
    "Side"      INTEGER NOT NULL,
    "Type"      INTEGER NOT NULL,
    CONSTRAINT "FK_Trades_Symbols_SymbolId" FOREIGN KEY ("SymbolId") REFERENCES "Symbols" ("Id") ON DELETE CASCADE
);

This holds trades from the Kraken cryptocurrency exchange.

The data was populated from CSVs downloaded here:

https://support.kraken.com/hc/en-us/articles/360047543791-Downloadable-historical-market-data-time-and-sales

The most recent trades were retrieved via API.

Data excerpt:

sqlite> SELECT * FROM Trades LIMIT 5;
Id        SymbolId  Price  Quantity   TimeStamp            Side  Type
--------  --------  -----  ---------  -------------------  ----  ----
17272735  29        2.364  29.18557   2021-08-10 15:31:26  0     0
17272736  29        2.364  12.50281   2021-08-10 15:31:34  0     0
17272737  29        2.363  416.76043  2021-08-10 15:31:36  0     0
17272738  29        2.365  38.94156   2021-08-10 15:32:00  0     0
17272739  29        2.363  0.41676    2021-08-10 15:32:05  0     0

The current size of the table:

sqlite> SELECT COUNT(*) FROM Trades;
COUNT(*)
---------
535560814
Run Time: real 7.746 user 0.984375 sys 6.750000

The Symbols table

CREATE TABLE IF NOT EXISTS "Symbols" (
    "Id" INTEGER NOT NULL CONSTRAINT "PK_Symbols" PRIMARY KEY AUTOINCREMENT,
    "Name" TEXT NOT NULL
);

Each trading pair has a symbol. E.g. XBTUSD for Bitcoin.

Data excerpt:

sqlite> SELECT * FROM Symbols LIMIT 5;
Id  Name
--  --------
29  1INCHEUR
30  1INCHUSD
31  AAVEAUD
32  AAVEETH
33  AAVEEUR

Some key indices

Indices on some critical columns:

CREATE UNIQUE INDEX "IX_Symbols_Name"     ON "Symbols" ("Name");
CREATE        INDEX "IX_Trades_SymbolId"  ON "Trades"  ("SymbolId");
CREATE        INDEX "IX_Trades_TimeStamp" ON "Trades"  ("TimeStamp");

Query performance

Most of the basic queries I've run so far seem to be quite slow. Here are some examples.

10 most recent XBTUSD trades (31 seconds)

SELECT * 
FROM Trades 
WHERE SymbolId = (SELECT Id FROM Symbols WHERE Name = 'XBTUSD') 
ORDER BY TimeStamp DESC 
LIMIT 10;

Run Time: real 31.641 user 28.984375 sys 2.640625

Id         SymbolId  Price    Quantity    TimeStamp                Side  Type
---------  --------  -------  ----------  -----------------------  ----  ----
552833548  377       48683.1  0.157       2021-12-06 06:16:17.804  1     0
552833547  377       48681.5  0.00049872  2021-12-06 06:16:04.855  0     0
552833546  377       48681.4  0.32        2021-12-06 06:16:02.729  1     0
552833545  377       48680.4  0.1         2021-12-06 06:15:53.826  0     0
552833544  377       48697.4  0.05        2021-12-06 06:15:39.56   1     1
552833543  377       48694.5  0.04923492  2021-12-06 06:15:22.787  0     0
552833542  377       48690.2  0.00210422  2021-12-06 06:15:22.786  0     0
552833541  377       48689.6  0.02201001  2021-12-06 06:15:22.487  0     0
552833540  377       48682.5  0.00052279  2021-12-06 06:15:22.486  0     0
552833539  377       48652.3  0.00011573  2021-12-06 06:15:22.485  0     0

Show high and low by month for XBTUSD (64 seconds)

SELECT 
    strftime('%Y-%m', TimeStamp)  AS date, 
    COUNT(*)                      AS trade_count, 
    min( CAST(Price AS DECIMAL) ) AS high, 
    max( CAST(Price AS DECIMAL) ) AS low 
FROM Trades 
WHERE 
    SymbolId = (SELECT Id FROM Symbols WHERE Name = 'XBTUSD') 
GROUP BY strftime('%Y-%m', TimeStamp)
ORDER BY date;

Run Time: real 64.743 user 55.421875 sys 8.625000

date     trade_count  high       low
-------  -----------  ---------  ---------
2013-10  422          122        207.30246
2013-11  7147         201.04     1198.8009
2013-12  14964        380        1130
2014-01  7747         737.15115  1019
2014-02  8990         450        835
2014-03  4314         436.07375  720
2014-04  3063         353.75804  583.75327
2014-05  2046         422.19     629.0355
2014-06  2533         525        677.90898
2014-07  838          566.75121  658.87046
2014-08  1756         455        602.36146
2014-09  930          295        527.99987
2014-10  1188         273.32564  418.98998
...

Number of trades by symbol (53 seconds)

SELECT Symbols.Name, SymbolId, COUNT(*) AS trade_count
FROM Trades JOIN Symbols ON Trades.SymbolId = Symbols.Id
GROUP BY SymbolId
ORDER BY trade_count;

Run Time: real 53.176 user 46.031250 sys 7.031250

Name       SymbolId  trade_count
---------  --------  -----------
WBTCXBT    369       561
RAYUSD     291       674
WBTCUSD    368       1003
YFIAUD     407       1460
GHSTGBP    167       1488
WBTCEUR    367       1860
FILAUD     154       2036
RAYEUR     290       2154
BNTGBP     88        2437
GRTAUD     174       2514
ZRXGBP     417       2732
BNTXBT     90        3196
SUSHIGBP   333       3387
...

The question

Do you have any suggestions for how to improve the performance of the queries shown above?

I.e. would you recommend additional indices? Or perhaps there are some sqlite specific parameters I should take a look at?

Or would you say that, based on the size of the database and the fact that it's sqlite, queries like this are just going to be slow. If that's the case, I guess I can consider using SQL Server or Postgresql. I like the idea of sqlite since it has the lowest overhead as far as setup goes. It's also free (unlike SQL Server).

Notes

Although I'm showing sqlite queries above, the database and schema were created via an Entity Framework Core 6.0 project.

The project which includes code for the EF Core database as well as programs which import the trade data from CSV and API is available here:

https://github.com/dharmatech/kraken-trades-database

The entity models and database context:

https://github.com/dharmatech/kraken-trades-database/blob/008-import-api/KrakenTradesDatabase/Model.cs

Code that imports the CSV (if you're curious):

https://github.com/dharmatech/kraken-trades-database/blob/008-import-api/KrakenImportCsv/Program.cs

Code that imports via API (if you're curious):

https://github.com/dharmatech/kraken-trades-database/blob/008-import-api/KrakenImportApi/Program.cs

TEXT columns

Lasse asks a good question in the comments below:

is there a reason why your price and amount was typed to text and not to REAL

He's referring to these columns in the Trades table:

"Price"     TEXT    NOT NULL,
"Quantity"  TEXT    NOT NULL,

That's what EF Core generated from the following C# model class:

[Index(nameof(TimeStamp))]
public class Trade
{
    public int Id { get; set; }

    public int SymbolId { get; set; }

    public Symbol Symbol { get; set; } // navigation property
    public decimal Price { get; set; }
    public decimal Quantity { get; set; }
    public DateTime TimeStamp { get; set; }

    public OrderSide Side { get; set; }
    public OrderTypeMinimal Type { get; set; }
}
0

There are 0 answers