Chunk handling

Most of the time, a step is a read-process-write task, and manipulated data is processed through subsets of a given size. This is called chunk handling. Each so-called chunk will be used as a checkpoint during processing. When a step fails, the current chunk will be rollbacked, while all previous processing will be saved. And on restart (if restart was enabled, which requires a database-persisted repository), the job will be restarted at the exact chunk where the failure happened.

In the following example, step will be processed through groups of 1000 records read by the reader.

Example 3.2. XML Job Configuration¶

...
<step id="TitleUpdateStep">
    <chunk item-count="1000">
        <reader ref="TitleUpdateStep/ReadTitles" />
        <processor ref="TitleUpdateStep/Processor" />
        <writer ref="TitleUpdateStep/UpdateTitles" />
    </chunk>
</step>
...

Item Reader¶

The Item reader is a step phase that retrieves data from a given source (database, file, etc.). It supplies items from the source until no more are available, in which case it will return null, and its processing is complete.

Item Processor¶

The Item Processor is a step phase that processes data retrieved by the reader. It can be used for any kind of manipulations: filtering depending on a business logic, field updates and complete transformation into a different kind of element. It will return the result of processing, which may be the initial element as is, the initial element with updates, or a completely different element. If it returns null, it means the read element is ignored, (thus filtering data read from source).

In case no processor is supplied, read data is transmitted as is to the writer.

Item Writer¶

The Item Writer is the final step phase that writes items to a target (database, file, etc.). It processes the elements given by the processor chunk by chunk, enabling rollback mechanics explained above.