Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial attempt at enabling reading the columns from the datasource #45

Merged
merged 8 commits into from
Jul 1, 2016
Merged

Initial attempt at enabling reading the columns from the datasource #45

merged 8 commits into from
Jul 1, 2016

Conversation

graysonarts
Copy link
Contributor

Addresses issue #42 . This enables the ability to read columns. I want to do more extensive testing, and I still need to add documentation into the code (pydoc stuff), but this is the initial attempt for feedback on the approach.

@benlower
Copy link
Contributor

Nice. Some questions:

  • what is the thinking with the version of "version='0.1.0-dev0'"? how is that being used and when will we increment the patch number?
  • i don't think this enables writing columns, correct? what would that entail?

@graysonarts
Copy link
Contributor Author

@benlower the version number was changed to appease setuptools. That's the preferred format for in-development versions. I'd honestly, leave it at 0 unless we need to push a development version to pypi for some reason (let's try to not do that). We'll increment the minor for the next release of features, but that'll be done right before we merge into master.

This does not address writing columns at all. We'll need more logic, but I wanted to unblock @t8y8 for reading columns as quickly as possible, so focused on reading first, and then I'll add writing.

@t8y8
Copy link
Contributor

t8y8 commented Jun 30, 2016

Will this work for TWBs? The XPath queries look like they shouldn't care, but workbooks might handle local/remote fields slightly differently?

@@ -0,0 +1,62 @@
import functools

_ATTRIBUTES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment how each of these fields relates to the UI, it's not intuitive that caption is what most people would call the 'name' in the UI, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that.

@graysonarts
Copy link
Contributor Author

@t8y8 For a workbook, you'd need to iterate through all of the datasources and get the columns from the datasource. Since workbook creates a Datasource object for each <datasource> tag passing only that tag into the object, it should "just work". I need to write test cases to ensure that it's true, though.

Do you want a method on workbook that gives you everything by combining datasource name and column name in the standard way: [datasource].[column]

@t8y8
Copy link
Contributor

t8y8 commented Jun 30, 2016

That makes sense, iterating through the datasources to get the fields works for me!

A convenience method "get_all_fields_with_datasource" or something is a nice addition for the "get a list of things to audit" use case.

@graysonarts
Copy link
Contributor Author

whew. I had to rebase that with the feature @t8y8 just merged in, and I was scared force pushing to github on my feature branch was going to do messy things, but yay! Must just be perforce withdrawals

@t8y8
Copy link
Contributor

t8y8 commented Jun 30, 2016

Running this through some real work workbooks and I noticed a few things:

  1. When it's a alias, vs caption, vs 'name' matters in terms of format, sometimes you get []'s and sometimes you don't.
  2. I triggered an exception, I'll open an issue with the stacktrace. It looks like bad escaping when building the xpath query
  3. It's tough to know when to use alias vs name vs caption, I have a gnarly
    [print(fields[i].alias or fields[i].caption or fields[i].name) for i in fields]`

@graysonarts
Copy link
Contributor Author

@t8y8 Suggested change to the API of this change

  • Rename name to id
  • Add a property name that resolves the most common name
  • Create an extended ordereddict that will attempt to resolve name for look up to match either id, caption, or alias

My understanding of how we display names in the UI is:

  • If it has an alias, it's the alias
  • If it has a caption, it's the caption
  • Otherwise, it's the name (though I think we never use the name because caption always exists)

@t8y8
Copy link
Contributor

t8y8 commented Jul 1, 2016

I like the API proposal, in the common case folks are trying to get a list of fields and matching Desktop is an intuitive default. For folks who need remote name they can go for 'id'. It remvoes the need for this function in my script too:

def pretty_name(field): if not fields[field].alias: pretty = fields[field].caption or fields[field].name else: pretty = field.alias return pretty

The ordereddict is probably only with it if it matches document order though, if it doesn't I wouldn't continue the trouble of the overhead (though py35 gets a c-based ordereddict) -- is it verified that it matches doc order?

@graysonarts
Copy link
Contributor Author

I haven't verified that it matches the order in the UI, but I'll do that before implementing the more complex look up.

@graysonarts
Copy link
Contributor Author

Okay playing around in the UI, it looks like fields are not related to document order. We display them in alphabetic order in the measures and dimensions panes. I think I'm going to ignore order for this PR and if we decide we want to be specific about order, I'll do it as a separate PR.

@t8y8
Copy link
Contributor

t8y8 commented Jul 1, 2016

Sounds good to me

@@ -10,6 +10,7 @@

from tableaudocumentapi import Connection, xfile
from tableaudocumentapi import Field
from tableaudocumentapi.multilookup_dict import MultiLookupDict
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't address the renaming, I'm working on that next.

@graysonarts graysonarts self-assigned this Jul 1, 2016
@t8y8
Copy link
Contributor

t8y8 commented Jul 1, 2016

I'm writing against the new api changes in a local branch, and it feels pretty good!

@graysonarts
Copy link
Contributor Author

yay! is that a "LGTM" for merging?

@t8y8
Copy link
Contributor

t8y8 commented Jul 1, 2016

Yup. LGTM to merge.

Just wrote some random lines against it, and it's feeling pretty good.

Since it's ultimately a dict, the iteration is over the keys by default, that does mean the code has to do something like:
for field in fields:\n print(fields[field].calculation)

But that's easily fixed with a for field in fields.values(): ... and I think ultimately the current design is the most flexible. Just a thing to note in our samples.

I'll share the script I was able to generate when combining this with the REST API! IT TOTALLY DOES WHAT I NEED NOW.

@graysonarts
Copy link
Contributor Author

@t8y8 Awesome, merging in just a moment.
When iterating over dictionaries in python the usual idiom is:

for k, v in datasources.items():
    # k is the key in the dictionary (in this case the id)
    # v is the value (the field object)

@graysonarts graysonarts merged commit 481f38c into tableau:development Jul 1, 2016
@graysonarts graysonarts deleted the feature-get-fields branch July 1, 2016 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants