Database field lengths for storing data with unknown und technically unlimited length?

351 views Asked by At

I have to store DOIs in a MySQL database. The handbook says:

There is no limitation on the length of a DOI name.

So far, the maximum length of a DOI in my current data is 78 chars. Which field length would you recommend in order to not waste storage space and to be on the safe side? In general:

How do you handle the problem of not knowing the maximum length of input data that has to be stored in a database, considering space and the efficiency of transactions?

EDIT

There are these two (simplified) tables document and topic with a one-to-many relationship:

CREATE TABLE document
(
  ID int(11) NOT NULL,
  DOI ??? NOT NULL,
  PRIMARY KEY (ID) 
);

CREATE TABLE topic
(
  ID int(11) NOT NULL,
  DocID int(11) NOT NULL,
  Name varchar(255) NOT NULL,
  PRIMARY KEY (ID),
  FOREIGN KEY (DocID) REFERENCES Document(ID), UNIQUE(DocID)
);

I have to run the following (simplified) query for statistics, returning the total value of referenced topic-categories per document (if there are any references):

SELECT COUNT(topic.Name) AS number, document.DOI
FROM document LEFT OUTER JOIN topic
ON document.ID = topic.DocID
GROUP BY document.DOI;

The character set used is utf_8_general_ci.

2

There are 2 answers

1
tadman On

TEXT and VARCHAR can store 64KB. If you're being extra paranoid, use LONGTEXT which allows 4GB, though if the names are actually longer than 64KB then that is a really abusive standard. VARCHAR(65535) is probably a reasonable accommodation.

Since VARCHAR is variable length then you really only pay for the extra storage if and when it's used. The limit is just there to cap how much data can, theoretically, be put in the field.

0
Rick James On

Space is not a problem; indexing may be a problem. Please provide the queries that will need an index on this column. Also provide the CHARACTER SET needed. With those, we can discuss the ramifications of various cutoffs: 191, 255, 767, 3072, etc.