Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a JSON library to feed FillableArray. #19

Merged
merged 49 commits into from
Nov 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
170ce13
Start use-json-library PR.
jpivarski Oct 22, 2019
f3425b7
Remove simdjson submodule because it requires C++17 (and Awkward targ…
jpivarski Oct 22, 2019
51c032f
Use RapidJSON instead.
jpivarski Oct 22, 2019
b77adad
We now know everything we need to about RapidJSON.
jpivarski Oct 22, 2019
4bdb265
Skeleton for JSON methods that hide the choice of JSON library.
jpivarski Oct 23, 2019
01dfe63
FromJsonString and FromJsonFile compile.
jpivarski Oct 23, 2019
b3fa75f
First successful tests (FromJsonString and FromJsonFile).
jpivarski Oct 23, 2019
e3968ef
Better error handling.
jpivarski Oct 23, 2019
bb04314
Content::tojson infrastructure compiles.
jpivarski Oct 23, 2019
7857387
More Content::tojson infrastructure compiles.
jpivarski Oct 23, 2019
eeb1282
Yet more Content::tojson infrastructure compiles.
jpivarski Oct 23, 2019
327d1f7
All tojson compiles, but none of it is tested yet.
jpivarski Oct 23, 2019
0538a4c
[skip ci] Make big datasets for testing.
jpivarski Oct 23, 2019
25a0795
Default pretty=False and add a handle to Writer::SetMaxDecimalPlaces.
jpivarski Oct 23, 2019
adf395d
Compile against RNTuple.
jpivarski Oct 23, 2019
e915b15
Better compilation line.
jpivarski Oct 23, 2019
63f053a
How to write triply jagged data in a TTree.
jpivarski Oct 23, 2019
65a51c1
The 'tojson' methods now have tests.
jpivarski Oct 23, 2019
9962237
Implemented but have not tested 'fromiter'.
jpivarski Oct 23, 2019
39189a3
Merge branch 'feature/use-json-library' of https://github.com/scikit-…
jpivarski Oct 23, 2019
ad907da
Wrote TTrees and they are correct.
jpivarski Oct 24, 2019
7fa0487
Merge branch 'feature/use-json-library' of https://github.com/scikit-…
jpivarski Oct 24, 2019
816e0f9
Expose FillableArray options in 'fromiter'.
jpivarski Oct 25, 2019
0a5be84
[skip ci] Do it right.
jpivarski Oct 25, 2019
e27314b
Keep track of studies for CHEP 2019.
jpivarski Oct 25, 2019
3db57d5
Conversion of ROOT's nested vectors of numbers has been implemented b…
jpivarski Oct 25, 2019
a3b4937
'fromroot_nestedvector' works on a synthetic test case.
jpivarski Oct 25, 2019
a28a257
'fromroot_nestedvector' works on real ROOT data (remember to skip the…
jpivarski Oct 25, 2019
0086c3b
TTree reading times.
jpivarski Oct 25, 2019
9c27b8f
First results.
jpivarski Oct 25, 2019
3c87eef
Beautify plot.
jpivarski Oct 25, 2019
05d271a
White background for plot.
jpivarski Oct 25, 2019
b498218
Add a test that's used in the CHEP 2019 presentation.
jpivarski Oct 26, 2019
0501770
Move type-checking logic of 'fromiter' into pybind11 C++.
jpivarski Oct 27, 2019
af99fdc
Simplify CHEP 2019 example by moving array-building into 'fill' method.
jpivarski Oct 27, 2019
018702b
Updated CHEP 2019 example.
jpivarski Oct 27, 2019
04ac782
Ignore data samples.
jpivarski Oct 27, 2019
8c83786
Change 'Error' into 'struct Error' everywhere.
jpivarski Oct 28, 2019
6a91652
Try to fix compilation errors on Windows.
jpivarski Oct 28, 2019
4bdad89
Try to fix a few more compilation errors on Windows.
jpivarski Oct 28, 2019
468cf85
Try to fix yet a few more compilation errors on Windows.
jpivarski Oct 28, 2019
35d9aa2
Try to fix yet-yet a few more compilation errors on Windows.
jpivarski Oct 28, 2019
5d46675
Wrote chep2019-studies-3.cpp.
jpivarski Oct 28, 2019
9875bce
Add RNTuple measurements to CHEP 2019.
jpivarski Oct 28, 2019
96f576f
Start adding plotting.
jpivarski Oct 28, 2019
d0913fb
Update the plots.
jpivarski Oct 28, 2019
6f03664
Consolidate files relevant for CHEP 2019 into one directory.
jpivarski Oct 28, 2019
cd27f33
gitignore
jpivarski Oct 28, 2019
62afebc
Don't forget numexpr.
jpivarski Nov 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .ci/azure-buildtest-awkward.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ trigger:
- .ci/azure-deploy-awkward.yml
- .ci/linux-build.sh
- docs/*
- studies/*

pr:
branches:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
studies/**/sample-*

############################################################# IDEs

# ...
Expand Down
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[submodule "pybind11"]
path = pybind11
url = https://github.com/pybind/pybind11.git
[submodule "simdjson"]
path = simdjson
url = https://github.com/lemire/simdjson.git
[submodule "rapidjson"]
path = rapidjson
url = https://github.com/Tencent/rapidjson.git
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ add_definitions(-DVERSION_INFO="${VERSION_INFO}")
set(CMAKE_MACOSX_RPATH 1)

file(GLOB CPU_KERNEL_SOURCES "src/cpu-kernels/*.cpp")
file(GLOB LIBAWKWARD_SOURCES "src/libawkward/*.cpp" "src/libawkward/array/*.cpp" "src/libawkward/fillable/*.cpp" "src/libawkward/type/*.cpp")
file(GLOB LIBAWKWARD_SOURCES "src/libawkward/*.cpp" "src/libawkward/array/*.cpp" "src/libawkward/fillable/*.cpp" "src/libawkward/type/*.cpp" "src/libawkward/io/*.cpp")
include_directories(include)
include_directories(rapidjson/include)

add_subdirectory(pybind11)

Expand All @@ -44,6 +45,7 @@ add_library(awkward SHARED $<TARGET_OBJECTS:awkward-objects>)
target_link_libraries(awkward-static PRIVATE awkward-cpu-kernels-static)
target_link_libraries(awkward PRIVATE awkward-cpu-kernels-static)
addtest(PR016 tests/test_PR016_finish_getitem_for_rawarray.cpp)
addtest(PR019 tests/test_PR019_use_json_library.cpp)

pybind11_add_module(layout src/pyawkward.cpp)
set_target_properties(layout PROPERTIES CXX_VISIBILITY_PRESET default)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
* [ ] Translation to and from Apache Arrow and Parquet in C++.
* [ ] Persistence to any medium that stores named binary blobs, as before, but accessible via C++ (especially for writing). The persistence format might differ slightly from the existing one (break backward compatibility, if needed).
* [ ] Universal `array.get[...]` as a softer form of `array[...]` that skips non-existent indexes, rather than raising errors.
* [ ] Explicit interface with [NumExpr](https://numexpr.readthedocs.io/en/latest/index.html).

### At some point in the future

Expand Down
1 change: 1 addition & 0 deletions awkward1/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import awkward1.layout
import awkward1._numba

from awkward1.operations.convert import *

__version__ = awkward1.layout.__version__
31 changes: 30 additions & 1 deletion awkward1/operations/convert.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

import numbers
import json
try:
from collections.abc import Iterable
except ImportError:
from collections import Iterable

import numpy

import awkward1.util
import awkward1.layout

def fromiter(iterable, initial=1024, resize=2.0):
out = awkward1.layout.FillableArray(initial=initial, resize=resize)
for x in iterable:
out.fill(x)
return out.snapshot()

def tolist(array):
if array is None or isinstance(array, (bool, str, bytes, numbers.Number)):
return array
Expand All @@ -26,4 +37,22 @@ def tolist(array):
else:
raise TypeError("unrecognized array type: {0}".format(repr(array)))

__all__ = [x for x in list(globals()) if not x.startswith("_") and x not in ("awkward1", "numpy")]
fromjson = awkward1.layout.fromjson

def tojson(array, *args, **kwargs):
if array is None or isinstance(array, (bool, str, bytes, numbers.Number)):
return json.dumps(array)

elif isinstance(array, numpy.ndarray):
return awkward1.layout.NumpyArray(array).tojson(*args, **kwargs)

elif isinstance(array, awkward1.layout.FillableArray):
return array.snapshot().tojson(*args, **kwargs)

elif isinstance(array, awkward1.layout.Content):
return array.tojson(*args, **kwargs)

else:
raise TypeError("unrecognized array type: {0}".format(repr(array)))

__all__ = [x for x in list(globals()) if not x.startswith("_") and x not in ("numbers", "json", "Iterable", "numpy", "awkward1")]
6 changes: 6 additions & 0 deletions include/awkward/Content.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
#ifndef AWKWARD_CONTENT_H_
#define AWKWARD_CONTENT_H_

#include <cstdio>

#include "awkward/cpu-kernels/util.h"
#include "awkward/Identity.h"
#include "awkward/Slice.h"
#include "awkward/io/json.h"

namespace awkward {
class Content {
Expand All @@ -17,6 +20,7 @@ namespace awkward {
virtual void setid() = 0;
virtual void setid(const std::shared_ptr<Identity> id) = 0;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual void tojson_part(ToJson& builder) const = 0;
virtual int64_t length() const = 0;
virtual const std::shared_ptr<Content> shallow_copy() const = 0;
virtual void checksafe() const = 0;
Expand All @@ -30,6 +34,8 @@ namespace awkward {
virtual const std::pair<int64_t, int64_t> minmax_depth() const = 0;

const std::string tostring() const;
const std::string tojson(bool pretty, int64_t maxdecimals) const;
void tojson(FILE* destination, bool pretty, int64_t maxdecimals, int64_t buffersize) const;
const std::shared_ptr<Content> getitem_ellipsis(const Slice& tail, const Index64& advanced) const;
const std::shared_ptr<Content> getitem_newaxis(const Slice& tail, const Index64& advanced) const;
};
Expand Down
1 change: 1 addition & 0 deletions include/awkward/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ namespace awkward {
const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
T getitem_at(int64_t at) const;
T getitem_at_unsafe(int64_t at) const;
void setitem_at_unsafe(int64_t at, T value) const;
IndexOf<T> getitem_range(int64_t start, int64_t stop) const;
IndexOf<T> getitem_range_unsafe(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Index> shallow_copy() const;
Expand Down
8 changes: 7 additions & 1 deletion include/awkward/Slice.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ namespace awkward {

class SliceRange: public SliceItem {
public:
SliceRange(int64_t start, int64_t stop, int64_t step): start_(start), stop_(stop), step_(step) {
SliceRange(int64_t start, int64_t stop, int64_t step): start_(start), stop_(stop), step_(step == none() ? 1 : step) {
assert(step_ != 0);
}
int64_t start() const { return start_; }
Expand Down Expand Up @@ -114,6 +114,12 @@ namespace awkward {
const Slice tail() const;
const std::string tostring() const;
void append(const std::shared_ptr<SliceItem>& item);
void append(const SliceAt& item);
void append(const SliceRange& item);
void append(const SliceEllipsis& item);
void append(const SliceNewAxis& item);
template <typename T>
void append(const SliceArrayOf<T>& item);
void become_sealed();
bool isadvanced() const;

Expand Down
1 change: 1 addition & 0 deletions include/awkward/array/ListArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ namespace awkward {
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual void tojson_part(ToJson& builder) const;
virtual int64_t length() const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual void checksafe() const;
Expand Down
1 change: 1 addition & 0 deletions include/awkward/array/ListOffsetArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ namespace awkward {
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual void tojson_part(ToJson& builder) const;
virtual int64_t length() const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual void checksafe() const;
Expand Down
2 changes: 2 additions & 0 deletions include/awkward/array/NumpyArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ namespace awkward {
bool isscalar() const;
bool isempty() const;
void* byteptr() const;
void* byteptr(ssize_t at) const;
ssize_t bytelength() const;
uint8_t getbyte(ssize_t at) const;

Expand All @@ -45,6 +46,7 @@ namespace awkward {
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual void tojson_part(ToJson& builder) const;
virtual int64_t length() const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual void checksafe() const;
Expand Down
60 changes: 58 additions & 2 deletions include/awkward/array/RawArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,26 @@
#include "awkward/Content.h"

namespace awkward {
void tojson_boolean(ToJson& builder, bool* array, int64_t length) {
for (int i = 0; i < length; i++) {
builder.boolean((bool)array[i]);
}
}

template <typename T>
void tojson_integer(ToJson& builder, T* array, int64_t length) {
for (int i = 0; i < length; i++) {
builder.integer((int64_t)array[i]);
}
}

template <typename T>
void tojson_real(ToJson& builder, T* array, int64_t length) {
for (int i = 0; i < length; i++) {
builder.real((double)array[i]);
}
}

template <typename T>
class RawArrayOf: public Content {
public:
Expand Down Expand Up @@ -123,6 +143,42 @@ namespace awkward {
return out.str();
}

virtual void tojson_part(ToJson& builder) const {
if (std::is_same<T, double>::value) {
tojson_real(builder, reinterpret_cast<double*>(byteptr()), length());
}
else if (std::is_same<T, float>::value) {
tojson_real(builder, reinterpret_cast<float*>(byteptr()), length());
}
else if (std::is_same<T, int64_t>::value) {
tojson_real(builder, reinterpret_cast<int64_t*>(byteptr()), length());
}
else if (std::is_same<T, uint64_t>::value) {
tojson_real(builder, reinterpret_cast<uint64_t*>(byteptr()), length());
}
else if (std::is_same<T, int32_t>::value) {
tojson_real(builder, reinterpret_cast<int32_t*>(byteptr()), length());
}
else if (std::is_same<T, uint32_t>::value) {
tojson_real(builder, reinterpret_cast<uint32_t*>(byteptr()), length());
}
else if (std::is_same<T, int16_t>::value) {
tojson_real(builder, reinterpret_cast<int16_t*>(byteptr()), length());
}
else if (std::is_same<T, uint16_t>::value) {
tojson_real(builder, reinterpret_cast<uint16_t*>(byteptr()), length());
}
else if (std::is_same<T, int8_t>::value) {
tojson_real(builder, reinterpret_cast<int8_t*>(byteptr()), length());
}
else if (std::is_same<T, uint8_t>::value) {
tojson_real(builder, reinterpret_cast<uint8_t*>(byteptr()), length());
}
else {
throw std::invalid_argument(std::string("cannot convert RawArrayOf<") + typeid(T).name() + std::string("> into JSON"));
}
}

virtual int64_t length() const { return length_; }

virtual const std::shared_ptr<Content> shallow_copy() const { return std::shared_ptr<Content>(new RawArrayOf<T>(id_, ptr_, offset_, length_, itemsize_)); }
Expand Down Expand Up @@ -232,7 +288,7 @@ namespace awkward {
throw std::runtime_error("array.ndim != 1");
}
Index64 flathead = array->ravel();
Error err = awkward_regularize_arrayslice_64(
struct Error err = awkward_regularize_arrayslice_64(
flathead.ptr().get(),
flathead.length(),
length_);
Expand All @@ -247,7 +303,7 @@ namespace awkward {

virtual const std::shared_ptr<Content> carry(const Index64& carry) const {
std::shared_ptr<T> ptr(new T[(size_t)carry.length()], awkward::util::array_deleter<T>());
Error err = awkward_numpyarray_getitem_next_null_64(
struct Error err = awkward_numpyarray_getitem_next_null_64(
reinterpret_cast<uint8_t*>(ptr.get()),
reinterpret_cast<uint8_t*>(ptr_.get()),
carry.length(),
Expand Down
Loading