DocTable Example: Pickle and Text Files

Here I show a bit about how to use picklefile and textfile column types. DocTable transparently handles saving and reading column data as separate files when data is large to improve performance of select queries. It will automatically create a folder in the same directory as your sqlite database and save or read file data as if you were working with a regular table entry.

In [1]:
import os
import sys
sys.path.append('..')
import doctable as dt
In [2]:
fname = 'tst3.db'
if os.path.exists(fname):
    os.remove(fname)
db = dt.DocTable(schema=[('idcol', 'id'),('picklefile', 'pic'), ('textfile','text')], fname=fname)
In [3]:
a = (1,2,43,4,5)
db.insert({'pic':a, 'text':str(a)})
Out[3]:
<sqlalchemy.engine.result.ResultProxy at 0x7f41a59502b0>
In [4]:
db.select() # regular select using the picklefile datatype
Out[4]:
[(1, (1, 2, 43, 4, 5), '(1, 2, 43, 4, 5)')]

Because doctable creates a transparent interface to work with these separate files, we need to use a new database to read the raw table schema and show the filenames that DocTable uses to reference stored data files.

In [5]:
dbchecker = dt.DocTable(fname=fname) # schema is inferred from db file
dbchecker.select() # this version without the doctable wrapper
Out[5]:
[(1, '384272277333.pic', '468571887065.txt')]

For performance reasons, DocTable never deletes stored file data unless you call the .clean_col_files() method directly. It will raise an exception if a referenced file is missing, and delete all files which are not referenced in the table.

In [6]:
# deletes files not in db and raise error if some db files not in filesystem
db.clean_col_files('pic')
db.clean_col_files('text')

Now I create another DocTable with a changed fpath argument. Because the argument changed, DocTable will raise an exception when selecting or calling .clean_col_files(). Be wary of this!

In [7]:
# now specify fpath in column type arguments.
db2 = dt.DocTable(schema=[('idcol', 'id'),('picklefile', 'pic', dict(), dict(fpath='custom_file_loc'))], fname=fname)
try:
    db2.clean_col_files('pic')
except FileNotFoundError:
    print('threw error because no files were found in the custom_file_loc folder, even though the db has a record.')
threw error because no files were found in the custom_file_loc folder, even though the db has a record.