-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another Pack from IBM (MONITOR_INGEST) #163
base: master
Are you sure you want to change the base?
Another Pack from IBM (MONITOR_INGEST) #163
Conversation
.circleci/config.yml
Outdated
@@ -30,7 +30,7 @@ jobs: | |||
- run: | |||
name: Download dependencies | |||
command: | | |||
git clone -b ${CI_BRANCH:-master} [email protected]:StackStorm-Exchange/ci.git ~/ci | |||
git clone -b ${CI_BRANCH:-master} [email protected]:Anshika-Gautam/ci.git ~/ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our pack is using mam-sdk which is dependent on iotfunctions package for its functioning. The "[email protected]:StackStorm-Exchange/ci.git ~/ci" is installing the pip version 9.0.3 which is not able to get the iotfunctions package from the specified git repository. It is resulting into the below mentioned error on circle:
`Using /home/circleci/virtualenv/lib/python3.6/site-packages
Finished processing dependencies for st2common==3.4.dev0
- [[ -f /home/circleci/repo/requirements.txt ]]
- echo 'Installing pack requirements from /home/circleci/repo/requirements.txt'
Installing pack requirements from /home/circleci/repo/requirements.txt - /home/circleci/virtualenv/bin/pip install -r /home/circleci/repo/requirements.txt
Collecting git+https://github.com/ibm-watson-iot/maximo-asset-monitor-sdk.git (from -r /home/circleci/repo/requirements.txt (line 2))
Cloning https://github.com/ibm-watson-iot/maximo-asset-monitor-sdk.git to /tmp/pip-o1f94hft-build
Warning: Permanently added the RSA host key for IP address '140.82.113.3' to the list of known hosts.
Collecting pandas-schema (from -r /home/circleci/repo/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/9c/03/6d87ce8719dc57e44688096c05fb0efa61a08c6838816c9d991b1ece5b24/pandas_schema-0.3.5-py3-none-any.whl
Collecting jsonschema>=3.2.0 (from mam-sdk==0.0.0->-r /home/circleci/repo/requirements.txt (line 2))
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/c5/8f/51e89ce52a085483359217bc72cdbf6e75ee595d5b1d4b5ade40c7e018b8/jsonschema-3.2.0-py2.py3-none-any.whl (56kB)
100% |████████████████████████████████| 61kB 6.6MB/s eta 0:00:01
Collecting iotfunctions@ git+https://github.com/ibm-watson-iot/functions.git@production#egg=iotfunctions (from mam-sdk==0.0.0->-r /home/circleci/repo/requirements.txt (line 2))
Could not find a version that satisfies the requirement iotfunctions@ git+https://github.com/ibm-watson-iot/functions.git@production#egg=iotfunctions (from mam-sdk==0.0.0->-r /home/circleci/repo/requirements.txt (line 2)) (from versions: )
No matching distribution found for iotfunctions@ git+https://github.com/ibm-watson-iot/functions.git@production#egg=iotfunctions (from mam-sdk==0.0.0->-r /home/circleci/repo/requirements.txt (line 2))
You are using pip version 9.0.3, however version 21.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Exited with code exit status 1
CircleCI received exit code 1`
We had removed the pip restriction in the package "[email protected]:Anshika-Gautam/ci.git ~/ci" and after that we are able to download all the dependencies for our pack. But the run test step is failing for the pack now as shown in snapshot below:
type: "string" | ||
required: true | ||
json_schema_path: | ||
description: "json Schema is must to validate CSV in case of action_type : DataClean" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this a JSON schema for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json schema defines particular format in which the data for ingestion should be supplied for our pack's data_ingest action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does _path
refer to a file on disk?
And does this mean that the file will need to exist on all st2actionrunner nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blag Yes _path refers to a file on disk. What are you trying point with st2actionrunner nodes being using the path? Can you please explain?
Note - The _path is mandatory to execute setup_entity action.
|
||
def run(self): | ||
# define validation elements | ||
print('1. Starting data Clean Action ..') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should replace prints with self.logger.debug()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the changes for print
if errors: | ||
errors_index_rows = [e.row for e in errors] | ||
print('5. Cleaning input CSV data ..') | ||
data_clean = data.drop(index=errors_index_rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's great practice to hard code paths like this in your actions. You're going to run into problems with there is >1 instance of your action running at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had removed the hard code paths. Thanks for your suggestion.
@@ -0,0 +1,102 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a naming perspective, a better name would be data_clean_csv
you can leave out the _action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -0,0 +1,37 @@ | |||
import yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with the name here, maybe just data_ingest_csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
dimension_data_path = None | ||
function_data_path = None | ||
|
||
if self._action_type == "SetupEntityAction": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why these are not parameters to the action and instead hard coded in the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our config file is actually working as a handler to perform different actions on the entity. We had updated the code to follow a modular approach now. Instead of if loops we tried using functions to call a particular action type.
Hi @nmaludy could you please suggest us with the issue related to pip version as it made us blocked in this submission -Abhay |
entity_name: | ||
description: "Entity Name is must in case of action_type : LoadCsv" | ||
type: "string" | ||
data_file_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the file identified here come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blag It is the same file which is residing on disk. This variable here is pointing to the path of the file.
raise ValueError('Missing action type key in config file') | ||
|
||
def run(self): | ||
operations_completed = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like using a mutable dictionary as a global variable. That's not intuitive and difficult to debug. Since all of the setup_*
functions are mutually exclusive, simply have them return the result you would like to return to StackStorm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blag removed mutable dictionary
|
||
"""----------STATUS----------""" | ||
self.logger.info('RESULT :') | ||
for name, status in operations_completed.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the logic in this loop is overcomplicated. Why not just return a tuple of (success, result_text)
to StackStorm? Seems much cleaner to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blag simplified the logic to return boolean values
@nmaludy @blag we have a blocker regarding pip version for downloading dependencies. As mentioned in thread #163 (comment). |
StackStorm-Exchange/ci#102 was merged today which updates the pinned version of pip to 20.0.2 Please push another commit to restart the tests. |
The following error was encountered while running circleci Traceback (most recent call last): File "/home/circleci/ci/.circle/validate.py", line 21, in <module> from st2common.models.api.pack import PackAPI File "/tmp/st2/st2common/st2common/models/api/pack.py", line 25, in <module> from st2common.util import schema as util_schema File "/tmp/st2/st2common/st2common/util/schema/__init__.py", line 112, in <module> "allOf": _validators.allOf_draft4, AttributeError: module 'jsonschema._validators' has no attribute 'allOf_draft4' Unable to retrieve pack name. In order to avoid this error the jsonschema is restricted to version 3.0.0
Eww. That's a nasty dependency conflict.
|
@cognifloyd are you working on updating Stackstorm packs to use the latest versions of the packages like jsonschema and others. As you can see the conflict is because of the different versions these dependencies(st2,mam-sdk) are using. |
I have a variety of other things I'm working on contributing. Recently I helped to get pip pinned to the same version in several of the StackStorm repos. That's how I came across this new pack. :) Would you be able to look into what it will take to update |
|
I'm closing and reopening this to trigger the latest CI. |
This is another pack from IBM similar to the first one monitor_mqtt