Ask Time:2021-09-22T05:41:13         Author:asuscondo

I was wondering if anyone knew the proper way to write out a group of files based on the value of a column in Dask. In other words, if I want to group a bunch of columns based on a value in a column and write those out to CSVs. I've been trying to use the groupby-apply paradigm with Dask, but the problem is that it does not return a dask.dataframe object, so the function I apply it with uses the Pandas API.

Is there a better way to approach what I'm trying to do? A scalable solution would be much appreciated because some of the data that I'm dealing with is very large.


