I was working on the design of a PlayBook app which would perform
some data logging, and considered using SQLite as the primary
mechanism. Alternatives included a simple text file (CSV format),
XML, or SharedObject (not really).
Since the user may want to acquire data for a very long time,
possibly on an ongoing basis for years, it seemed important to
spend at least a bit of time considering the storage space that
would be used.
With my prototype, which simply appends rows to a CSV file each
time a reading occurs, the file would grow to approximately 50MB
in the course of a year if the app were always running. Each
record is roughly 204 bytes including newline.
The nature of the data is largely integers, but I realized that
many of them would be negative, and wondered about how this would
affect the file size of a SQLite database.
SQLite is interesting in many ways, one of which is that it uses
a variable-length format for integers,
using anywhere from 1 to 8 bytes depending on the size. That page
may be inaccurate, because in a more
detailed description
it says it could be 1 to 9 bytes.
I built a small test database with an INTEGER PRIMARY KEY column,
some text, and another integer column, and created rows with a variety
of data. I then analyzed a hex dump of the records in the data area.
I immediately noted that my rows that had negative primary key values
were taking up space for lots of FF bytes for the primary key. Yikes!
I thought, is it really this bad for storing negative integers? Here's
a couple of samples (note, this output doesn't actually match the schema
I mentioned above, but it doesn't matter for this analysis):
0A FF FF FF FF FF FF FF FF 81 05 00 17 00 00 66 2D 31 32 37 "f-127"
08 FF FF FF FF FF FF FF FF FD 05 00 13 00 00 66 2D 33 "f-3"
08 FF FF FF FF FF FF FF FF FE 05 00 13 00 00 66 2D 32 "f-2"
08 03 05 00 13 00 00 66 2B 33 "f+3"
09 40 05 00 15 00 00 66 2B 36 34 "f+64"
0A 81 00 05 00 17 00 00 66 2B 31 32 38 "f+128"
Note all those FF bytes. What a waste of space. The positive integers
are stored compactly, with 3 and 64 requiring only a single byte, and 128
taking up two bytes. (Note that it would have made comparison easier had
I used the same length of text in each row.)
Upon further inspection and reading, I sorted it out. Every row,
even if you don't specify an INTEGER PRIMARY KEY, has a ROWID assigned
by the database to track it. If you specify any other column as the INTEGER type,
even negative values will be stored compactly. A -3 is stored as just FD,
rather than FF FF FF FF FF FF FF FD as shown in the ROWID sample earlier.
Here's the key point: if you have a column which is defined as the
INTEGER PRIMARY KEY, then SQLite doesn't bother creating a real column
just for that data. It will reuse the ROWID instead. But if you plan
to store lots of small negative values in that field, you'll potentially
be using up lots of extra bytes for all those FFs! The "varint" format
for ROWID is compact only for positive values.
It would take a very particular type of data for this even possibly
to be a problem. You'd have to have lots of negative values between
-1 and maybe -16 million or so.
If this were an issue, the "fix" would be to define a separate column
for your data, and a INTEGER PRIMARY KEY which you mostly ignore,
allowing SQLite to auto-assign and increment as positive values.
As a crude estimate of the difference in space used, here are file
sizes for the two versions, one where we store our negative values
in the INTEGER PRIMARY KEY column where they end up becoming the
ROWID, and the other where we assign a second column just for that
data. Each example assumes a million rows, with data going from
-1 to -999,999, and I'm ignoring any overhead for tracking b-tree pages,
the database header, etc.
| Mode |
File Size |
| reuse ROWID |
10.5MB |
| separate field |
6.6MB |
Okay, well, that about ties things up. Quite clearly, even on the
PlayBook, it's unlikely a difference of 4MB is going to matter to anyone.
And none of this would have affected me anyway, as the values I need
for my primary key fields turn out not to be negative!