DocTable Overview

A DocTable instance is a reference to a single database table, and can be used to insert, delete, select, or update the rows of that table. While it can infer the table schema upon connecting, typically it is used with an in-code schema definition including column types, constraints, and indices. This allows DocTable to have additional information about the data you store and how you can use it. See the full documentation for a full list of class methods.

In [1]:
import random
import pandas as pd
import numpy as np

import sys
import doctable as dt

Schema Definitions

First we create a schema definiton and use it to instantiate the DocTable. By default, the DocTable will connect to an in-memory database, so the examples here will not rely on a filesystem database. Printing the object will show the table name as well as the number of rows in the table.

In [2]:
schema = (
    ('integer','id',dict(primary_key=True, autoincrement=True)),
    ('string','name', dict(nullable=False)),
    ('boolean', 'is_old'),
db = dt.DocTable(target=':memory:', schema=schema)
<DocTable::sqlite:///:memory::_documents_ ct: 0>

Access table columns as objects by subscripting the doctable directly.

In [3]:
Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)

As we'll show later, these column objects also have some operators defined such that they can be used to construct complex queries and functions.

In [4]:
db['id'] > 3
<sqlalchemy.sql.elements.BinaryExpression object at 0x7f4f6c69d828>

Inserting Rows

We use the .insert() method to insert a row passed as a dictionary of column name -> value entries.

In [5]:
for i in range(5):
    age = random.random() # number in [0,1]
    is_old = age > 0.5
    row = {'name':'user_'+str(i), 'age':age, 'is_old':is_old}
<DocTable::sqlite:///:memory::_documents_ ct: 5>

Select Statements

Now we show how to select data from the table. Use the .count() method to check the number of rows. It also accepts some column conditionals to count entries that satisfy a given criteria

In [6]:
db.count(), db.count(db['is_old']==True)
(5, 3)

Use the .select() method with no arguments to retrieve all rows of the table. You can also choose to select one or more columns to select.

In [7]:
[(1, 'user_0', 0.8444218515250481, True),
 (2, 'user_1', 0.7579544029403025, True),
 (3, 'user_2', 0.420571580830845, False),
 (4, 'user_3', 0.25891675029296335, False),
 (5, 'user_4', 0.5112747213686085, True)]
In [8]:'name')
['user_0', 'user_1', 'user_2', 'user_3', 'user_4']
In [9]:['id','name'])
[(1, 'user_0'), (2, 'user_1'), (3, 'user_2'), (4, 'user_3'), (5, 'user_4')]
In [10]:['age'].sum)

The SUM() and COUNT() SQL functions have been mapped to .sum and .count attributes of columns.

In [11]:[db['age'].sum,db['age'].count])
[(2.7931393069577677, 5)]

Alternatively, to see the results as a pandas dataframe, we can use .select_df().

In [12]:
id name age is_old
0 1 user_0 0.844422 True
1 2 user_1 0.757954 True
2 3 user_2 0.420572 False
3 4 user_3 0.258917 False
4 5 user_4 0.511275 True

Now we can select specific elements of the db using the where argument of the .select() method.

In [13]:['is_old']==True)
[(1, 'user_0', 0.8444218515250481, True),
 (2, 'user_1', 0.7579544029403025, True),
 (5, 'user_4', 0.5112747213686085, True)]
In [14]:['id']==3)
[(3, 'user_2', 0.420571580830845, False)]

We can update the results in a similar way, using the where argument.

In [15]:
db.update({'name':'smartypants'}, where=db['id']==3)
[(1, 'user_0', 0.8444218515250481, True),
 (2, 'user_1', 0.7579544029403025, True),
 (3, 'smartypants', 0.420571580830845, False),
 (4, 'user_3', 0.25891675029296335, False),
 (5, 'user_4', 0.5112747213686085, True)]
In [16]:
[(1, 'user_0', 84.4421851525048, True),
 (2, 'user_1', 75.79544029403024, True),
 (3, 'smartypants', 42.0571580830845, False),
 (4, 'user_3', 25.891675029296334, False),
 (5, 'user_4', 51.12747213686085, True)]

And we can delete elements using the .delete() method.

In [17]:
[(1, 'user_0', 84.4421851525048, True),
 (2, 'user_1', 75.79544029403024, True),
 (4, 'user_3', 25.891675029296334, False),
 (5, 'user_4', 51.12747213686085, True)]

Notes on DB Interface

DocTable2 allows you to access columns through direct subscripting, then relies on the power of sqlalchemy column objects to do most of the work of constructing queries. Here are a few notes on their use. For more demonstration, see the example in examples/dt2_select.ipynb

In [18]:
# subscript is used to access underlying sqlalchemy column reference (without querying data)
Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)
In [19]:
# conditionals are applied directly to the column objects (as we'll see with "where" clause)
db['id'] < 3
<sqlalchemy.sql.elements.BinaryExpression object at 0x7f4efa3199b0>
In [20]:
# can also access using .col() method
Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)
In [21]:
# to access all column objects (only useful for working directly with sql info)
<sqlalchemy.sql.base.ImmutableColumnCollection at 0x7f4efa3882d0>
In [23]:
# to access more detailed schema information
name type nullable default autoincrement primary_key
0 id INTEGER False None auto 1
1 name VARCHAR False None auto 0
2 age INTEGER True None auto 0
3 is_old BOOLEAN True None auto 0
In [24]:
# If needed, you can also access the sqlalchemy table object using the .table property.
Table('_documents_', MetaData(bind=Engine(sqlite:///:memory:)), Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False), Column('name', String(), table=<_documents_>, nullable=False), Column('age', Integer(), table=<_documents_>), Column('is_old', Boolean(), table=<_documents_>), schema=None)
In [25]:
# the count method is also an easy way to count rows in the database
In [26]:
# the print method makes it easy to see the table name and total row count
<DocTable::sqlite:///:memory::_documents_ ct: 4>

Type Mappings

DocTable2 provides a simplified interface into the SQLAlchemy core package component (not the object-relational mapping component). With this interface DocTable2 is able to provide an object-oriented interface to execute SQL commands. This package simplifies that interface by working with the various objects within the class, allowing the user to create schemas and perform queries without working with the hundreds of classes required by SQLAlchemy core.

Because of this, it is important to note the interface between them. The first is the type map used to set up the schema. The DocTable2 constructor provides a schema interface which accepts strings as types, so the type map appears here:

In [32]:
{'biginteger': sqlalchemy.sql.sqltypes.BigInteger,
 'boolean': sqlalchemy.sql.sqltypes.Boolean,
 'date': sqlalchemy.sql.sqltypes.Date,
 'datetime': sqlalchemy.sql.sqltypes.DateTime,
 'enum': sqlalchemy.sql.sqltypes.Enum,
 'float': sqlalchemy.sql.sqltypes.Float,
 'integer': sqlalchemy.sql.sqltypes.Integer,
 'interval': sqlalchemy.sql.sqltypes.Interval,
 'largebinary': sqlalchemy.sql.sqltypes.LargeBinary,
 'numeric': sqlalchemy.sql.sqltypes.Numeric,
 'smallinteger': sqlalchemy.sql.sqltypes.SmallInteger,
 'string': sqlalchemy.sql.sqltypes.String,
 'text': sqlalchemy.sql.sqltypes.Text,
 'time': sqlalchemy.sql.sqltypes.Time,
 'unicode': sqlalchemy.sql.sqltypes.Unicode,
 'unicodetext': sqlalchemy.sql.sqltypes.UnicodeText,
 'json': doctable.coltypes.JSONType,
 'pickle': doctable.coltypes.CpickleType,
 'parsetree': doctable.coltypes.ParseTreeType,
 'picklefile': doctable.coltypes.PickleFileType,
 'textfile': doctable.coltypes.TextFileType}