[Question posted by a user on YugabyteDB Community Slack]
I'm doing the Oracle DB migration to YB and when testing and noticed after Import the size of the YB is tripled on each node, meaning 50GB of Oracle DB ended up occupying 150GB on each node for 3 node cluster. Isn't YugabyteDB able to compress natively? Please note that RF is set to 3. Just curious to know about a significant increase in storage requirement.
This is my table schema:
CREATE TABLE "CT2_INVOICELINE_ORIGINAL"
( "INVOICEID" NUMBER(10,0) NOT NULL ENABLE,
"INVOICELINEID" NUMBER(10,0) NOT NULL ENABLE,
"INVOFIELD01" VARCHAR2(100),
"INVOFIELD02" VARCHAR2(100),
"INVOFIELD03" VARCHAR2(100),
"INVOFIELD04" VARCHAR2(100),
"INVOFIELD05" VARCHAR2(100),
"INVOFIELD06" VARCHAR2(100),
"INVOFIELD07" VARCHAR2(100),
"INVOFIELD08" VARCHAR2(100),
"INVOFIELD09" VARCHAR2(100),
"INVOFIELD10" VARCHAR2(100),
"INVOFIELD11" VARCHAR2(100),
"INVOFIELD12" VARCHAR2(100),
"INVOFIELD13" VARCHAR2(100),
"INVOFIELD14" VARCHAR2(100),
"INVOFIELD15" VARCHAR2(100),
"INVOFIELD16" VARCHAR2(100),
"INVOFIELD17" VARCHAR2(100),
"INVOFIELD18" VARCHAR2(100),
"INVOFIELD19" VARCHAR2(100),
"INVOFIELD20" VARCHAR2(100),
"INVOFIELD21" VARCHAR2(100),
"INVOFIELD22" VARCHAR2(100),
"INVOFIELD23" VARCHAR2(100),
"INVOFIELD24" VARCHAR2(100),
"INVOFIELD25" VARCHAR2(100),
"INVOFIELD26" VARCHAR2(100),
"INVOFIELD27" VARCHAR2(100),
"INVOFIELD28" VARCHAR2(100),
"INVOFIELD29" VARCHAR2(100),
"INVOFIELD30" VARCHAR2(100),
"INVOFIELD31" VARCHAR2(100),
"INVOFIELD32" VARCHAR2(100),
"INVOFIELD33" VARCHAR2(100),
"INVOFIELD34" VARCHAR2(100),
"INVOFIELD35" VARCHAR2(100),
"INVOFIELD36" VARCHAR2(100),
"INVOFIELD37" VARCHAR2(100),
"INVOFIELD38" VARCHAR2(100),
"INVOFIELD39" VARCHAR2(100),
"INVOFIELD40" VARCHAR2(100),
"INVOFIELD41" VARCHAR2(100),
"INVOFIELD42" VARCHAR2(100),
"INVOFIELD43" VARCHAR2(100),
"INVOFIELD44" VARCHAR2(100),
"INVOFIELD45" VARCHAR2(100),
"INVOFIELD46" VARCHAR2(100),
"INVOFIELD47" VARCHAR2(100),
"INVOFIELD48" VARCHAR2(100),
"INVOFIELD49" VARCHAR2(100),
"INVOFIELD50" VARCHAR2(100),
"INVOFIELD51" VARCHAR2(100),
"INVOFIELD52" VARCHAR2(100),
"INVOFIELD53" VARCHAR2(100),
"INVOFIELD54" VARCHAR2(100),
"INVOFIELD55" VARCHAR2(100),
"INVOFIELD56" VARCHAR2(100),
"INVOFIELD57" VARCHAR2(100),
"INVOFIELD58" VARCHAR2(100),
"INVOFIELD59" VARCHAR2(100),
"INVOFIELD60" VARCHAR2(100)
) SEGMENT CREATION IMMEDIATE
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 26255360 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "MISEK_DATA" ;
The source table does have 1 - 3 characters on most of the columns and around 10+ null columns as well.
When putting only 1 char in the varchar(100) you have 4x storage overhead on YugabyteDB. So if you have really small or null values this can be explained.
This is because of the way YugabyteDB currently stores data, where each column is a separate key-value in the underlying storage engine (rocksdb), while Oracle stores the full row as packed in a single tuple. Explained in detail in the docs: https://docs.yugabyte.com/latest/architecture/docdb/persistence/
“Packed row” format is under development that will fix this overhead soon: https://github.com/yugabyte/yugabyte-db/issues/3520