Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add object mapping #508

Closed
nyamsprod opened this issue Oct 30, 2023 · 12 comments
Closed

Add object mapping #508

nyamsprod opened this issue Oct 30, 2023 · 12 comments

Comments

@nyamsprod
Copy link
Member

Feature Request

Q A
New Feature yes
BC Break no

Introduction

One of the most requested feature is to add the ability to map CSV record to object.
Currently the package only allows:

  • array<array-key, string|null>
  • typed array with broader support for record value type via the use of Reader::addFormatter.

Proposal

We need a mechanism the allow converting array to object. The mechanism should be:

  • transparent
  • ease to use and maintain
  • extensible to allow for user defined rules on casting

In order to do so we propose to introduce a new map method to TabularDataReader with the following signature:

TabularDataReader::map(string $class): Iterator;

Let's assume the following CSV document.

$weather = <<<CSV
date,temperature,place
2011-01-01,1,Galway
2011-01-02,-1,Galway
2011-01-03,0,Galway
2011-01-01,6,Berkeley
2011-01-02,8,Berkeley
2011-01-03,5,Berkeley
CSV;

And the following DTO and Enum classes.

class Weather
{
    private DateTime $createdAt;

    public function __construct(
        #[League\Csv\Attribute\Column(offset:'temperature')]
        public readonly int $temperature,
        #[League\Csv\Attribute\Column(offset:2, cast: CastToEnum::class)]
        public readonly Place $place,
    ) {
    }

    public function getDate(): DateTime
    {
        return $this->createdAt;
    }

    #[League\Csv\Attribute\Column(
        offset: 'date',
        cast: CastToDate::class,
        castArguments: ['format' => '!Y-m-d', 'timezone' => 'Africa/Kinshasa']
    )]
    public function setDate(DateTime $createdAt): void
    {
        $this->createdAt = $createdAt;
    }
}

enum Place: string
{
    case Galway = 'Galway';
    case Berkeley = 'Berkeley';
}

Using the new map method we should be able to do the following

$csv = Reader::createFromString($weather);
$csv->setHeaderOffset(0);
$record = [...$csv][0];
$weather = [...$csv->map(Weather::class)][0];

var_dump($record);
// returns ['date' => '2011-01-01', 'temperature' => '1', 'place' => 'Galway']

echo $weather->getDate()->format('d/m/Y'); // display '01/01/2011'
echo $weather->place->value; // display the enum value 'Galway';
echo $weather->temperature;  // display 1 as an integer

The $record example is kept to highlight the difference between both approaches.

In order to achieve this mapping the following will be introduced:

  • an attribute should be introduced League\Csv\Attribute\Column to control the mapping
  • a mechanism to allow user-defined casting via an simple interface League\Csv\Cast\Cast
  • internal classes to handle mapping using PHP Reflection.

Breaking changes

For developers who have extended the League\Csv\Reader and/or the League\Csv\ResultSet with their own version of a map method a possible BC will happen.

@nyamsprod nyamsprod self-assigned this Oct 30, 2023
nyamsprod added a commit that referenced this issue Oct 30, 2023
nyamsprod added a commit that referenced this issue Oct 30, 2023
nyamsprod added a commit that referenced this issue Oct 30, 2023
nyamsprod added a commit that referenced this issue Oct 30, 2023
nyamsprod added a commit that referenced this issue Oct 30, 2023
nyamsprod added a commit that referenced this issue Oct 30, 2023
@tacman
Copy link
Contributor

tacman commented Oct 31, 2023

I like the idea of mapping, and I have a few ideas.

i think the entity class itself should have a

#[League\Csv\Attribute\Sheet()]  // or Table() Csv()?
class Movie
{
    public string $title;
    public int $year;
    public string $imdbId;
}

If the property name matches (ignoring case matching snake-case, camelCase, spacing, etc.), then the mapping should happen automatically. That would cover the many common cases.

I like the cast: property, as that is often what is needed most. I'm wondering if we can have an inline function there,

#[Column(cast: fn(string $s) => (int)str_replace('tt', '', $s)] # changes tt1000 to 1000
public int $imdbId;

Ideally, we'd be able to handle related objects, but this requires more discussion.

class Subtitle {
     public Movie $movie;
     public string $isoLanguageCode;
}

One issue for me is a consistent way of defining an object with simple types (string, int, float, bool) without needing a DTO, but instead having some sort of configuration. That is, the results of these attributes act as same as adding a custom formatter.

$reader = Reader::createFromString($csvString)->setHeaderOffset(0);
$reader->addFormatter(function (array $record): array {
     foreach ($record as $var => $val) {
     $newValue = match($var) {
          'imdbId' => (int)str_replace('tt', '', $s),
          default($val)
    };
    $record[$var] = $val;
     return $record;
});

The missing step is mapping the $record to the DTO class. I usually use the Symfony serializer/normalizer services to do that.

Thanks for bringing this topic up, indeed it's complicated but I think will be very powerful.

@tacman
Copy link
Contributor

tacman commented Oct 31, 2023

I have a Symfony bundle that uses this library, it's pretty messy because I used a different CsvReader before and now it's a mismash of code.

But brainstorming about what I'd like to see within that bundle, I can imaging firing a ReadRecord event that applied a formatter.

$reader = Reader::createFromString($csvString)->setHeaderOffset(0);
foreach ($reader as $row) {
    // events get called automatically, no need to specify here.
}

#[AsEventListener(event: CsvReaderEvent::READ_ROW, method: 'convert')]
class CsvReaderEventListener {
    public function convert(CsvReaderEvent $event): void
    {
        $record = $event->getRecord();
        $record['imdbId'] = str_replace('tt', '', $record['imdbId']);  
        $event->setRecord($record);
    }

I haven't thought through how the event would actually be structured, since multiple listeners could be called. At some point we need to call the map to move from the array to the object.

The reason to consider events is to inject services that can't be injected into the DTO.

Again, just brainstorming here.

@nyamsprod
Copy link
Member Author

@tacman thanks for the feedback. First of all I do not plan on building yet another full blown serializer there are a lot of robust and battle test serializer out there (Symfony, Serde and in some way Valinor could be considered as great library to do so.)
The feature added in League\Csv allow some type of deserialization and will try to answer most of the use case but definetly not all of them.

IMHO...

#[League\Csv\Attribute\Sheet()]  // or Table() Csv()?
class Movie
{
    public string $title;
    public int $year;
    public string $imdbId;
}

is a great idea and will implement it. It resolve a lot of use case for the user. I think it will look like this in the end

#[Record]
class Movie
{
    public string $title;
    public int $year;
    public string $imdbId;
    #[Cell(offset:'imdbId' cast:CastToDate::class, castArguments:['format' => 'd-m-Y'])]
    public DateTimeInterface $releaseDate;
}

So that the Cell attribute can fine tune the casting if needed.

Adding closure or auto-resolving inner classes IMHO is out of context for now.

We might revisit it in the future but for now it would add a lot of burden on the feature. Right now the targetted audience is casting to simple DTO, everything with union or intersection type or auto-resolving unknown classes is IMHO out of scope.
Let's ship a simple feature which is easy to use and to extend and see how people react.

For now, if you really need a more complex casting I believe you are better of using a good old foreach loop on the Reader and apply you business logic sequencially... my 2 cents.

@tacman
Copy link
Contributor

tacman commented Oct 31, 2023

Sounds good, I look forward to it.

Question: when you parse out the attributes, do you end up with a set of mapping rules? If so, it is possible to set those rules in some other way besides the class attributes? In my code, I created a schema map that went from the CSV column to the object property, by indeed it got ugly quickly!

Of course you'd want to leverage existing serializers. Perhaps documenting it would be enough for many users.

@tacman
Copy link
Contributor

tacman commented Oct 31, 2023

what is the offset?

 #[Cell(offset:'imdbId'

?

@nyamsprod
Copy link
Member Author

what is the offset?

 #[Cell(offset:'imdbId'

?

This is to specify the header offset or the record cell.

@tacman
Copy link
Contributor

tacman commented Oct 31, 2023

so in the example, it should be 'releaseYear', or something like that, then releaseYear would get mapped to releaseDate, which is a DateTime object.

@nyamsprod
Copy link
Member Author

so in the example, it should be 'releaseYear', or something like that, then releaseYear would get mapped to releaseDate, which is a DateTime object.

https://github.com/thephpleague/csv/blob/feature/object-casting/src/SerializerTest.php

you have concrete examples in the PR tests

nyamsprod added a commit that referenced this issue Nov 2, 2023
Adding the ability to deserialize a record into an object
@nyamsprod
Copy link
Member Author

@tacman the feature is now complete and stable on the master branch you can already try it out... by downloading dev-master. The final documentation is also available already on the documentation website on https://csv.thephpleague.com/9.0/reader/record-mapping/

@nyamsprod
Copy link
Member Author

The feature is now available starting with version 9.12.0. Some changes and tweaks were done during testing period.

@thePanz
Copy link

thePanz commented Jan 16, 2024

Hi @nyamsprod , thanks for the implementation!

Would be nice to have the iterators for objects being typed
As an example (did not test the code)

    /**
     * @template T of object
     * 
     * @param class-string<T> $className
     * @param array<string> $header
     * @return iterator<T>
     *
     * @throws Exception
     * @throws MappingFailed
     * @throws TypeCastingFailed
     */
    public function getObjects(string $className, array $header = []): Iterator

@nyamsprod
Copy link
Member Author

@thePanz you can submit a small PR with your addition I would gladly accept it after review 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants