DeliRoll |
Amazon Redshift |
||
---|---|---|---|
Instance type | cr1.8xlarge | Redshift 8XL | |
Instances | 1 | 3 | |
Cores | 16 | 32 | |
RAM | 244GB | 240GB | |
Cost / Month | $2,520 | $9,792 |
SELECT sum(column) FROM table
Redshift |
1.9s |
---|---|
DeliRoll |
0.6s |
SELECT sum(column) FROM table GROUP BY dimension
Redshift |
21.6s |
---|---|
DeliRoll |
17.4s |
SELECT sum(col1),sum(col2),...,sum(col7) FROM table
Redshift |
4.8s |
---|---|
DeliRoll |
0.7s |
SELECT sum(col1),sum(col2),...,sum(col250) FROM table
Redshift |
...8 minutes! |
---|---|
DeliRoll |
1.9s |
Bad programmers worry about the code.
Good programmers worry about
data structures
and their relationships.
Linus Torvalds
{"cookie_set": | true | ,"country": | "Sweden" | ,"cost": | 840 | } |
{"cookie_set": | false | ,"country": | "UK" | ,"cost": | 23 | } |
{"cookie_set": | true | ,"country": | "China" | ,"cost": | 11 | } |
{"cookie_set": | true | ,"country": | "Sweden" | ,"cost": | 840 | } |
{"cookie_set": | false | ,"country": | "UK" | ,"cost": | 23 | } |
{"cookie_set": | true | ,"country": | "China" | ,"cost": | 11 | } |
{"cookie_set": | integer | ,"country": | integer | ,"cost": | integer | } |
{"cookie_set": | integer | ,"country": | integer | ,"cost": | integer | } |
{"cookie_set": | integer | ,"country": | integer | ,"cost": | integer | } |
Compression ratio dropped from 11:1 to 1.5:1
The dense encoding results to faster computation
The dense encoding is partially
Numba is a just-in-time specializing compiler which compiles annotated Python and NumPy code to LLVM.
from numba import autojit
@autojit
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
with self.DEF([(m, types[m]) for m in args]):
self.INIT_SPARSEVAR(score_metrics)
with self.FOR_ALL_ROWS(has_row_filter):
with self.ROW_MATCHES(flags, where_expr, score_metrics, partitioned):
self.ASSIGN_METRICS(score_metrics)
while (_indptr_idx < _indptr_len):
while (True):
_byte = _indptr[_indptr_idx]
if (_byte & 1):
_indptr_row += (_byte >> 1) + 1
_indptr_val |= (_byte >> 1) << _shift
while_condition_20:15:
%_shift_2 = phi i32
%_indptr_idx_3 = phi i64
while_body_21:12:
%81 = mul i64 %_indptr_idx_3, %29
%82 = add i64 0, %81
%83 = getelementptr i8* %21, i64 %82
Multicorn allows you to access any data source in your PostgreSQL database in Python.
CREATE FOREIGN TABLE constanttable (
test character varying, test2 character varying
) server multicorn_srv options (
wrapper 'myfdw.ConstantForeignDataWrapper'
)
SELECT * from constanttable;
class ConstantForeignDataWrapper(ForeignDataWrapper):
def execute(self, quals, columns):
for index in range(4):
line = {}
for column_name in self.columns:
line[column_name] = '%s %s' % (column_name, index)
yield line
test | test2
---------+----------
test 0 | test2 0
test 1 | test2 1
test 2 | test2 2
test 3 | test2 3
That's the whole point.
You can use a high-level language to quickly implement domain-specific solutions that outperform generic solutions, regardless of the language they use.
You dare to do things
that would be too painful to do
with a lower-level language.
Python, Erlang, Big Data, AWS