Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add way to generate DataFrame from active_record with aggregated fields #523

Open
janpeterka opened this issue Feb 28, 2020 · 3 comments
Open

Comments

@janpeterka
Copy link

janpeterka commented Feb 28, 2020

Recently I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.

Here is what I wanted to use:

active_record = Provider.left_join(zip: :district).group(:id)

However I found out that I cannot give field like ANY_VALUE(district.id), because it gets converted to symbol infrom_activerecord, and subsequently pluck tries to convert it to table.column.
(At least thats how I understand it works).

So, we found out way to bypass this and I was thinking about adding this to daru, in something like this:

      # Load dataframe from AR::Relation
      #
      # @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
      # @param with_sql_methods [Boolean] Enables giving fields with SQL methods
      #
      # @return A dataframe containing the data in the given relation
      def from_activerecord(relation, *fields, with_sql_methods: false)
        fields = relation.klass.column_names if fields.empty?

        fields = if with_sql_methods
                   fields.map(&:to_s)
                 else
                   fields.map(&:to_sym)

        result = relation.pluck(*fields).transpose
        Daru::DataFrame.new(result, order: fields).tap(&:update)
      end

Now I can create new DataFrame as

data_frame = Daru::DataFrame.from_activerecord(active_record,
                                              ["ANY_VALUE(district.id)"],
                                              with_sql_methods: true)

What do you think about that?

@janpeterka janpeterka changed the title Add way to generate DataFrame from active_record with aggredated fields Add way to generate DataFrame from active_record with aggregated fields Feb 28, 2020
@athityakumar
Copy link
Member

@janmpeterka - Thanks for this feature request suggestion! 🎉

You'd have to contribute this to the daru-io repository for this. We currently have the implementation of ActiveRecord importer here, wherein we support just normal field names and not sql methods. You can probably add has_sql_methods flag / keyword argument - the rest of the logic you might already find in the existing importer logic itself 😄

Would you like to contribute this feature, @janmpeterka?

@janpeterka
Copy link
Author

Thanks, I will look into it. Not sure if I will be able to write the implementation myself, quite new to Ruby :)

@janpeterka
Copy link
Author

Well, daru-io is quite inactive, so no contributing there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants