Skip to content

Commit

Permalink
Use a JSON library to feed FillableArray. (#19)
Browse files Browse the repository at this point in the history
Contrary to earlier plans, I used RapidJSON, rather than simdjson, for compatibility.

* Start use-json-library PR.

* Remove simdjson submodule because it requires C++17 (and Awkward targets C++11).

* Use RapidJSON instead.

* We now know everything we need to about RapidJSON.

* Skeleton for JSON methods that hide the choice of JSON library.

* FromJsonString and FromJsonFile compile.

* First successful tests (FromJsonString and FromJsonFile).

* Better error handling.

* Content::tojson infrastructure compiles.

* More Content::tojson infrastructure compiles.

* Yet more Content::tojson infrastructure compiles.

* All tojson compiles, but none of it is tested yet.

* [skip ci] Make big datasets for testing.

* Default pretty=False and add a handle to Writer::SetMaxDecimalPlaces.

* Compile against RNTuple.

* Better compilation line.

* How to write triply jagged data in a TTree.

* The 'tojson' methods now have tests.

* Implemented but have not tested 'fromiter'.

* Wrote TTrees and they are correct.

* Expose FillableArray options in 'fromiter'.

* [skip ci] Do it right.

* Keep track of studies for CHEP 2019.

* Conversion of ROOT's nested vectors of numbers has been implemented but is untested.

* 'fromroot_nestedvector' works on a synthetic test case.

* 'fromroot_nestedvector' works on real ROOT data (remember to skip the first 6 bytes\!).

* TTree reading times.

* First results.

* Beautify plot.

* White background for plot.

* Add a test that's used in the CHEP 2019 presentation.

* Move type-checking logic of 'fromiter' into pybind11 C++.

* Simplify CHEP 2019 example by moving array-building into 'fill' method.

* Updated CHEP 2019 example.

* Ignore data samples.

* Change 'Error' into 'struct Error' everywhere.

* Try to fix compilation errors on Windows.

* Try to fix a few more compilation errors on Windows.

* Try to fix yet a few more compilation errors on Windows.

* Try to fix yet-yet a few more compilation errors on Windows.

* Wrote chep2019-studies-3.cpp.

* Add RNTuple measurements to CHEP 2019.

* Start adding plotting.

* Update the plots.

* Consolidate files relevant for CHEP 2019 into one directory.

* gitignore

* Don't forget numexpr.
  • Loading branch information
jpivarski authored Nov 3, 2019
1 parent 8f2018a commit edbfcd5
Show file tree
Hide file tree
Showing 62 changed files with 2,553 additions and 174 deletions.
1 change: 1 addition & 0 deletions .ci/azure-buildtest-awkward.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ trigger:
- .ci/azure-deploy-awkward.yml
- .ci/linux-build.sh
- docs/*
- studies/*

pr:
branches:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
studies/**/sample-*

############################################################# IDEs

# ...
Expand Down
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[submodule "pybind11"]
path = pybind11
url = https://github.com/pybind/pybind11.git
[submodule "simdjson"]
path = simdjson
url = https://github.com/lemire/simdjson.git
[submodule "rapidjson"]
path = rapidjson
url = https://github.com/Tencent/rapidjson.git
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ add_definitions(-DVERSION_INFO="${VERSION_INFO}")
set(CMAKE_MACOSX_RPATH 1)

file(GLOB CPU_KERNEL_SOURCES "src/cpu-kernels/*.cpp")
file(GLOB LIBAWKWARD_SOURCES "src/libawkward/*.cpp" "src/libawkward/array/*.cpp" "src/libawkward/fillable/*.cpp" "src/libawkward/type/*.cpp")
file(GLOB LIBAWKWARD_SOURCES "src/libawkward/*.cpp" "src/libawkward/array/*.cpp" "src/libawkward/fillable/*.cpp" "src/libawkward/type/*.cpp" "src/libawkward/io/*.cpp")
include_directories(include)
include_directories(rapidjson/include)

add_subdirectory(pybind11)

Expand All @@ -44,6 +45,7 @@ add_library(awkward SHARED $<TARGET_OBJECTS:awkward-objects>)
target_link_libraries(awkward-static PRIVATE awkward-cpu-kernels-static)
target_link_libraries(awkward PRIVATE awkward-cpu-kernels-static)
addtest(PR016 tests/test_PR016_finish_getitem_for_rawarray.cpp)
addtest(PR019 tests/test_PR019_use_json_library.cpp)

pybind11_add_module(layout src/pyawkward.cpp)
set_target_properties(layout PROPERTIES CXX_VISIBILITY_PRESET default)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
* [ ] Translation to and from Apache Arrow and Parquet in C++.
* [ ] Persistence to any medium that stores named binary blobs, as before, but accessible via C++ (especially for writing). The persistence format might differ slightly from the existing one (break backward compatibility, if needed).
* [ ] Universal `array.get[...]` as a softer form of `array[...]` that skips non-existent indexes, rather than raising errors.
* [ ] Explicit interface with [NumExpr](https://numexpr.readthedocs.io/en/latest/index.html).

### At some point in the future

Expand Down
1 change: 1 addition & 0 deletions awkward1/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import awkward1.layout
import awkward1._numba

from awkward1.operations.convert import *

__version__ = awkward1.layout.__version__
31 changes: 30 additions & 1 deletion awkward1/operations/convert.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

import numbers
import json
try:
from collections.abc import Iterable
except ImportError:
from collections import Iterable

import numpy

import awkward1.util
import awkward1.layout

def fromiter(iterable, initial=1024, resize=2.0):
out = awkward1.layout.FillableArray(initial=initial, resize=resize)
for x in iterable:
out.fill(x)
return out.snapshot()

def tolist(array):
if array is None or isinstance(array, (bool, str, bytes, numbers.Number)):
return array
Expand All @@ -26,4 +37,22 @@ def tolist(array):
else:
raise TypeError("unrecognized array type: {0}".format(repr(array)))

__all__ = [x for x in list(globals()) if not x.startswith("_") and x not in ("awkward1", "numpy")]
fromjson = awkward1.layout.fromjson

def tojson(array, *args, **kwargs):
if array is None or isinstance(array, (bool, str, bytes, numbers.Number)):
return json.dumps(array)

elif isinstance(array, numpy.ndarray):
return awkward1.layout.NumpyArray(array).tojson(*args, **kwargs)

elif isinstance(array, awkward1.layout.FillableArray):
return array.snapshot().tojson(*args, **kwargs)

elif isinstance(array, awkward1.layout.Content):
return array.tojson(*args, **kwargs)

else:
raise TypeError("unrecognized array type: {0}".format(repr(array)))

__all__ = [x for x in list(globals()) if not x.startswith("_") and x not in ("numbers", "json", "Iterable", "numpy", "awkward1")]
6 changes: 6 additions & 0 deletions include/awkward/Content.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
#ifndef AWKWARD_CONTENT_H_
#define AWKWARD_CONTENT_H_

#include <cstdio>

#include "awkward/cpu-kernels/util.h"
#include "awkward/Identity.h"
#include "awkward/Slice.h"
#include "awkward/io/json.h"

namespace awkward {
class Content {
Expand All @@ -17,6 +20,7 @@ namespace awkward {
virtual void setid() = 0;
virtual void setid(const std::shared_ptr<Identity> id) = 0;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual void tojson_part(ToJson& builder) const = 0;
virtual int64_t length() const = 0;
virtual const std::shared_ptr<Content> shallow_copy() const = 0;
virtual void checksafe() const = 0;
Expand All @@ -30,6 +34,8 @@ namespace awkward {
virtual const std::pair<int64_t, int64_t> minmax_depth() const = 0;

const std::string tostring() const;
const std::string tojson(bool pretty, int64_t maxdecimals) const;
void tojson(FILE* destination, bool pretty, int64_t maxdecimals, int64_t buffersize) const;
const std::shared_ptr<Content> getitem_ellipsis(const Slice& tail, const Index64& advanced) const;
const std::shared_ptr<Content> getitem_newaxis(const Slice& tail, const Index64& advanced) const;
};
Expand Down
1 change: 1 addition & 0 deletions include/awkward/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ namespace awkward {
const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
T getitem_at(int64_t at) const;
T getitem_at_unsafe(int64_t at) const;
void setitem_at_unsafe(int64_t at, T value) const;
IndexOf<T> getitem_range(int64_t start, int64_t stop) const;
IndexOf<T> getitem_range_unsafe(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Index> shallow_copy() const;
Expand Down
8 changes: 7 additions & 1 deletion include/awkward/Slice.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ namespace awkward {

class SliceRange: public SliceItem {
public:
SliceRange(int64_t start, int64_t stop, int64_t step): start_(start), stop_(stop), step_(step) {
SliceRange(int64_t start, int64_t stop, int64_t step): start_(start), stop_(stop), step_(step == none() ? 1 : step) {
assert(step_ != 0);
}
int64_t start() const { return start_; }
Expand Down Expand Up @@ -114,6 +114,12 @@ namespace awkward {
const Slice tail() const;
const std::string tostring() const;
void append(const std::shared_ptr<SliceItem>& item);
void append(const SliceAt& item);
void append(const SliceRange& item);
void append(const SliceEllipsis& item);
void append(const SliceNewAxis& item);
template <typename T>
void append(const SliceArrayOf<T>& item);
void become_sealed();
bool isadvanced() const;

Expand Down
1 change: 1 addition & 0 deletions include/awkward/array/ListArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ namespace awkward {
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual void tojson_part(ToJson& builder) const;
virtual int64_t length() const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual void checksafe() const;
Expand Down
1 change: 1 addition & 0 deletions include/awkward/array/ListOffsetArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ namespace awkward {
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual void tojson_part(ToJson& builder) const;
virtual int64_t length() const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual void checksafe() const;
Expand Down
2 changes: 2 additions & 0 deletions include/awkward/array/NumpyArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ namespace awkward {
bool isscalar() const;
bool isempty() const;
void* byteptr() const;
void* byteptr(ssize_t at) const;
ssize_t bytelength() const;
uint8_t getbyte(ssize_t at) const;

Expand All @@ -45,6 +46,7 @@ namespace awkward {
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual void tojson_part(ToJson& builder) const;
virtual int64_t length() const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual void checksafe() const;
Expand Down
60 changes: 58 additions & 2 deletions include/awkward/array/RawArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,26 @@
#include "awkward/Content.h"

namespace awkward {
void tojson_boolean(ToJson& builder, bool* array, int64_t length) {
for (int i = 0; i < length; i++) {
builder.boolean((bool)array[i]);
}
}

template <typename T>
void tojson_integer(ToJson& builder, T* array, int64_t length) {
for (int i = 0; i < length; i++) {
builder.integer((int64_t)array[i]);
}
}

template <typename T>
void tojson_real(ToJson& builder, T* array, int64_t length) {
for (int i = 0; i < length; i++) {
builder.real((double)array[i]);
}
}

template <typename T>
class RawArrayOf: public Content {
public:
Expand Down Expand Up @@ -123,6 +143,42 @@ namespace awkward {
return out.str();
}

virtual void tojson_part(ToJson& builder) const {
if (std::is_same<T, double>::value) {
tojson_real(builder, reinterpret_cast<double*>(byteptr()), length());
}
else if (std::is_same<T, float>::value) {
tojson_real(builder, reinterpret_cast<float*>(byteptr()), length());
}
else if (std::is_same<T, int64_t>::value) {
tojson_real(builder, reinterpret_cast<int64_t*>(byteptr()), length());
}
else if (std::is_same<T, uint64_t>::value) {
tojson_real(builder, reinterpret_cast<uint64_t*>(byteptr()), length());
}
else if (std::is_same<T, int32_t>::value) {
tojson_real(builder, reinterpret_cast<int32_t*>(byteptr()), length());
}
else if (std::is_same<T, uint32_t>::value) {
tojson_real(builder, reinterpret_cast<uint32_t*>(byteptr()), length());
}
else if (std::is_same<T, int16_t>::value) {
tojson_real(builder, reinterpret_cast<int16_t*>(byteptr()), length());
}
else if (std::is_same<T, uint16_t>::value) {
tojson_real(builder, reinterpret_cast<uint16_t*>(byteptr()), length());
}
else if (std::is_same<T, int8_t>::value) {
tojson_real(builder, reinterpret_cast<int8_t*>(byteptr()), length());
}
else if (std::is_same<T, uint8_t>::value) {
tojson_real(builder, reinterpret_cast<uint8_t*>(byteptr()), length());
}
else {
throw std::invalid_argument(std::string("cannot convert RawArrayOf<") + typeid(T).name() + std::string("> into JSON"));
}
}

virtual int64_t length() const { return length_; }

virtual const std::shared_ptr<Content> shallow_copy() const { return std::shared_ptr<Content>(new RawArrayOf<T>(id_, ptr_, offset_, length_, itemsize_)); }
Expand Down Expand Up @@ -232,7 +288,7 @@ namespace awkward {
throw std::runtime_error("array.ndim != 1");
}
Index64 flathead = array->ravel();
Error err = awkward_regularize_arrayslice_64(
struct Error err = awkward_regularize_arrayslice_64(
flathead.ptr().get(),
flathead.length(),
length_);
Expand All @@ -247,7 +303,7 @@ namespace awkward {

virtual const std::shared_ptr<Content> carry(const Index64& carry) const {
std::shared_ptr<T> ptr(new T[(size_t)carry.length()], awkward::util::array_deleter<T>());
Error err = awkward_numpyarray_getitem_next_null_64(
struct Error err = awkward_numpyarray_getitem_next_null_64(
reinterpret_cast<uint8_t*>(ptr.get()),
reinterpret_cast<uint8_t*>(ptr_.get()),
carry.length(),
Expand Down
Loading

0 comments on commit edbfcd5

Please sign in to comment.