Chunk handling

Most of the time, a step is a read-process-write task, and manipulated data is processed through subsets of a given size. This is called chunk handling. Each so-called chunk will be used as a checkpoint during processing. When a step fails, current chunk will be rollbacked, while all previous processing will be saved. And on restart (if restart was enabled, which requires a database-persisted repository), the job will be restarted at exact chunk where failure happened.

In following example, step will be processed through groups of 1000 records read by reader.

Example 3.2. XML Job Configuration¶

...
<step id="TitleUpdateStep">
    <chunk item-count="1000">
        <reader ref="TitleUpdateStep/ReadTitles" />
        <processor ref="TitleUpdateStep/Processor" />
        <writer ref="TitleUpdateStep/UpdateTitles" />
    </chunk>
</step>
...

Item Reader¶

The Item reader is step phase that retrieves data from a given source (database, file, etc.). It supplies items from source until no more are available, in which case it will return null, and its processing is complete.

Item Processor¶

The Item Processor is step phase that processes data retrieved by reader. It can be used for any kind of manipulations: filtering depending on a business logic, field updates and complete transformation into a different kind of element. It will return result of processing, which may be initial element as is, the initial element with updates, or a completely different element. If it returns null, it means read element is ignored, (thus filtering data read from source).

In case no processor is supplied, read data is transmitted as is to writer.

Item Writer¶

The Item Writer is the final step phase that writes items to a target (database, file, etc.). It processes the elements given by the processor chunk by chunk, enabling rollback mechanics explained above.