Question:
First I create the model in my application, then the Entity Framework generates the SQL to create the table. The first declaration generates a column with type varchar(20)
, the second generates it as longtext
.
Example
[StringLength(20)]
public string Codigo { get; set; }
public string CodigoDois { get; set; }
Doubts
Is there any difference between these two statements (talking about database space allocation)? Even if they store the same value as "test" which is 6 characters long?
If I know a field has a variation of its length, let's say between 10-15 characters, would it be a better decision to limit it to the maximum length or leave it "unlimited" (talking about database space allocation)?
Answer:
When the Entity Framework generates the tables in your database it checks the types of each field, in the case of the STRING type when you specify the size it makes the same specification for the database with its corresponding type.
In the case of your
[StringLength(20)]
public string Codigo { get; set; }
The MySQL counterpart is varchar(20), but when the same string type is declared without a fixed length the Entity Framework allocates as much as possible for that type in the database than in the case of MySQL and longtext.
BLOB type columns like LONGTEXT are inherently variable length and take up almost no storage when not used. The space needed by them themselves is not affected in the case of a NULL value, in the case of a use as the text "teste"
the allocation is defined in the length of the string passed.
Advantages / Disadvantages of BLOBs vs. VARCHARs
All comments in this paragraph concerning type VARCHAR are valid for type CHAR as well. Each comment ends with a BLOB+ or VARCHAR+ tag to indicate which data type is better.
- Do you know maximum length of your data?
With VARCHARs you need to declare the maximum string length. With blobs you don't have to worry about it. BLOB +
- Do you need to store very long strings?
A single VARCHAR column is restricted to 32K bytes (ie about 10,000 Unicode characters) . The maximum blob size is (according to the Usage Guide);
- page size 1kb => 64 Mb
- page size 2kb => 512 Mb
- 4KB page size => 4Gb
- 8KB page size => 32Gb
BLOB +
- Do you need to store many columns of long text in single table?
The total (uncompressed) line length is restricted to 64K. VARCHARs are stored in-line directly, so you cannot store too many long strings in one line. Blobs are represented by their blob-id, so it only uses 8 bytes of maximum 64K. BLOB +
- Do you want to minimize the call between client and server?
VARCHAR data is fetched along with other row data in one fetch operation, and typically multiple rows are sent across the network at the same time. Every single blob needs to do an extra open/fetch
. VARCHAR +
- Do you want to minimize the amount of data transferred between client and server?
The advantage of blobs is that when fetching row you get only blob-id, so you can decide whether you want to fetch blob data or not. In older versions of InterBase there was a problem that VARCHARs were sent over the network in the declared full length. This issue has already been fixed in Firebird 1.5 and InterBase 6.5. draw (BLOB+ for older server versions)
- Do you want to minimize used space?
VARCHARs are RLE compressed (in fact whole rows are compressed except blobs). Maximum 128 bytes can be compressed to 2 bytes. This means that even empty VARCHAR (32000) will occupy 500 + 2 bytes.
Blobs are not compressed, but empty (ie null) blob will only occupy 8 bytes of blob-id (and will later be RLE compressed). non-empty blob can be stored on the same page as other row data (if applicable), or on separate page. Small blob that fits into the data page has an overhead of about 40 bytes (or a little more). Large blob has the same 40 bytes overhead on the data page, plus 28 bytes overhead on each blob page (30 bytes on the first). A blob page cannot contain more than one blob (ie, blob pages are not shared as data pages). For example. for 4K page size, if you store 5K blob, two blob type pages will be allocated, which means you lose 3K of space! In other words – the larger page size, the greater probability that small blobs will fit into the data page, but also more wasted space if separate blob pages are needed for large blobs. VARCHAR + (except VARCHARs with extremely large declared length, or tables with many NULL blobs)
- Do you need a table with extremely large number of rows?
Each row is identified by DB_KEY, which is a 64-bit, 32-bit value where it represents relation ID and 32 bits are used to locate the row. Theoretical maximum number of rows in a table is 2^32 (but for various reasons the true maximum is even smaller). Blob-ids are assigned from the same address space as DB_KEYs, this means the more blobs in the table, the fewer DB_KEYs will remain to face queues. On the other hand, when stored rows are wide (for example if they contain long VARCHARs), then fewer rows will fit in the data page and many DB_KEY values will remain unasigned anyway. varchar + ?
- Do you want a good performance?
Because large blobs are stored outside data pages, they increase row "density" in data pages and thus caching efficiency (reduce the number of I/O operations during the search). BLOB +
- Do you need to search on the content of text columns?
In the VARCHAR column you can use operators like '= ', ' >', among them, from () , case sensitive like and starting, case insensitive CONTAINING . In most cases index can be used to speed up search. Blobs cannot be indexed, and you are restricted to LIKE, starting, and containing operators. You cannot directly compare blobs with the '= ', ' >' etc operators. (unless you use UDF) so you can't, for example, join tables into Blob fields. VARCHAR +
- Do you want to search the content of these texts with CONTENTS?
Containig can be used to perform a case-insensitive search for the contents of the VARCHAR field. (No use of index) Because you cannot set collating order for blob columns, you cannot use case insensitive search fully with national characters in blob columns (only lower half of character set will be case insensitive) . (Alternatively, you can use UDF). Firebird 2 already allows you to define grouping of text (and binary) columns. VARCHAR +
- Do you need to capitalize text column content?
You can use the built-in UPPER() function in VARCHAR but not in blob. (Also CAST, MIN , MAX cannot be used with blobs ) VARCHAR +
Cannot sort by blob column. (AND GROUP BY, DISTINCT , UNION, JOIN ON ) It is not possible to concatenate blob columns. VARCHAR +
There is no built-in cast function (CAST ) to convert blob to VARCHAR or VARCHAR to blob. (But it is possible to write UDF for this purpose. ) Since Firebird 1.5 it is possible to use builtin function SUBSTRING to convert blob to VARCHAR (but FROM and clauses cannot exceed 32K) . to draw
It is not possible to assign a value to blob directly in the SQL command, for example. Insert tab values ( MyBlob ) ( 'abc' ); (But it is possible to use UDF to convert string to blob). VARCHAR +
Firebird – 0.9.4 already has this draw functionality
- Do you need good security on these columns of text?
To retrieve the data from the table, you need to be granted the SELECT privilege. To recover blob, you need to know unique blob -id (stored in table), but Firebird/InterBase will not check if you have any rights to table blob belongs to. This means that everyone who knows or guesses blob -id right can read the blob without any rights at the table. ( You can try it with ISQL and BLOBDUMP command . ) VARCHAR +
You can see more details on the links below.