Spring Batch from First Principles: Build a CSV-to-H2 Order Processing Job

This learning lab is for a reader who knows basic Java and Spring Boot but has never used Spring Batch. The goal is not only to make a batch job run, but to understand the core Spring Batch structure through a small runnable project: a Job containing a chunk-oriented Step that reads orders from a CSV file, classifies each order, writes processed rows into an H2 business table, and records execution metadata in separate Spring Batch metadata tables.

This article is pinned to:

Java 17+
Spring Boot 3.5.x
Spring Batch 5.2.x
Gradle
H2
Java configuration style

The examples use Spring Boot 3.5.15, which manages Spring Batch 5.2.6. Stay within the Spring Boot 3.5.x and Spring Batch 5.2.x line for this lab. Do not upgrade the article to Spring Boot 4.x or Spring Batch 6.x without revisiting the version-risk notes near the end.

1. What This Lab Builds

A Spring Batch application is a Spring Boot application that runs batch work in a structured way. The top-level unit is a Job. A job contains one or more Steps. In a chunk-oriented step, Spring Batch repeatedly reads items, optionally processes them, groups them into chunks, and writes each chunk.

In this lab, you will build this flow:

src/main/resources/orders.csv
        |
        v
ItemReader<OrderInput>
        |
        v
ItemProcessor<OrderInput, ProcessedOrder>
        |
        v
JdbcBatchItemWriter<ProcessedOrder>
        |
        v
processed_orders business table

At the same time, Spring Batch records execution metadata:

JobRepository
        |
        v
BATCH_* metadata tables

These two paths must stay separate in your mental model.

Area	Table examples	Written by	Meaning
Business data	`processed_orders`	`JdbcBatchItemWriter`	The actual processed order rows produced by this lab
Spring Batch metadata	`BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, `BATCH_STEP_EXECUTION`, `BATCH_JOB_EXECUTION_PARAMS`	`JobRepository`	Infrastructure records that describe which jobs and steps ran, with what parameters and status

The most important distinction in this lab is:

JobRepository stores metadata about the batch execution. It does not store your processed orders.

The processed orders go into the business table. The Batch execution history goes into BATCH_* tables.

2. The Core Vocabulary Before Code

`Job`

A Job is the named top-level batch workflow.

In this lab:

Job: importOrdersJob

Its role is to define the overall batch process and connect that process to Spring Batch’s execution model.

A Job does not read CSV rows directly. It does not classify order amounts directly. It does not insert business rows directly. Those responsibilities belong to components inside the job.

`Step`

A Step is one executable phase inside a job.

In this lab:

Step: importOrdersStep

This step is chunk-oriented. It connects:

reader -> processor -> writer

The step is where we say:

Read items, process them, group them into chunks, and write each chunk transactionally.

`ItemReader`

An ItemReader<I> reads one input item at a time.

In this lab:

ItemReader<OrderInput>

It reads rows from orders.csv and turns each CSV row into an OrderInput.

Its boundary:

It knows how to read the input.
It should not classify the order.
It should not insert rows into the database.

`ItemProcessor`

An ItemProcessor<I, O> transforms one item into another item.

In this lab:

ItemProcessor<OrderInput, ProcessedOrder>

It receives one OrderInput, applies the classification rule, and returns one ProcessedOrder.

Classification rule:

amount >= 1000 -> HIGH_VALUE
otherwise      -> STANDARD

Its boundary:

It owns business transformation.
It should not know how to read the CSV file.
It should not know how to write SQL.

`ItemWriter`

An ItemWriter<O> writes processed items.

In this lab:

JdbcBatchItemWriter<ProcessedOrder>

It receives chunks of processed orders and inserts them into the business table:

processed_orders

Its boundary:

It writes business output.
It does not create JobInstance records.
It does not write Batch metadata tables.

`JobRepository`

JobRepository is Spring Batch’s metadata store.

It records things like:

job instances
job executions
job parameters
step executions
status
read count
write count
start and end times

Its boundary:

It stores Batch execution metadata.
It does not store the processed order rows.

This distinction matters because both the business table and the Batch metadata tables can live in the same H2 database during the lab. Same database does not mean same responsibility.

`JobParameters`

JobParameters are values passed to a job run.

They matter because Spring Batch uses identifying job parameters as part of JobInstance identity.

The simplified rule is:

JobInstance = Job name + identifying JobParameters

So these two runs are different logical job instances:

importOrdersJob + run.id=1
importOrdersJob + run.id=2

This lab uses explicit job parameters instead of RunIdIncrementer so that the identity rule stays visible.

3. Create the Project with Spring Initializr

Use Spring Initializr as the primary project creation path.

Create a project with these settings:

Setting	Value
Project	Gradle
Language	Java
Spring Boot	`3.5.x`
Group	`com.example`
Artifact	`spring-batch-orders-lab`
Name	`spring-batch-orders-lab`
Package name	`com.example.batchorders`
Packaging	Jar
Java	17 or newer

Add these dependencies:

Spring Batch
JDBC API
H2 Database

Download the project, unzip it, and enter the project directory:

cd spring-batch-orders-lab

The final project tree will look like this:

spring-batch-orders-lab/
|-- .gitignore
|-- build.gradle
|-- settings.gradle
|-- data/
|   `-- h2/
|       `-- ...
`-- src/
    `-- main/
        |-- java/
        |   `-- com/
        |       `-- example/
        |           `-- batchorders/
        |               |-- BatchOrdersApplication.java
        |               |-- batch/
        |               |   `-- BatchConfig.java
        |               `-- model/
        |                   |-- OrderInput.java
        |                   `-- ProcessedOrder.java
        `-- resources/
            |-- application.yml
            |-- orders.csv
            `-- schema.sql

Create the missing directories:

mkdir -p src/main/java/com/example/batchorders/batch
mkdir -p src/main/java/com/example/batchorders/model
mkdir -p data/h2

Update .gitignore so local database files are not committed:

.gradle/
build/
data/

The data/ directory is local lab state. It contains generated H2 database files, not source code.

Your generated settings.gradle should be equivalent to:

rootProject.name = 'spring-batch-orders-lab'

Your generated build.gradle may differ slightly depending on Spring Initializr, but for this lab it should be equivalent to:

plugins {
    id 'java'
    id 'org.springframework.boot' version '3.5.15'
    id 'io.spring.dependency-management' version '1.1.7'
}
 
group = 'com.example'
version = '0.0.1-SNAPSHOT'
 
java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(17)
    }
}
 
repositories {
    mavenCentral()
}
 
dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-batch'
    implementation 'org.springframework.boot:spring-boot-starter-jdbc'
 
    runtimeOnly 'com.h2database:h2'
 
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
    testImplementation 'org.springframework.batch:spring-batch-test'
}
 
tasks.named('test') {
    useJUnitPlatform()
}
 
tasks.register('h2Shell', JavaExec) {
    group = 'verification'
    description = 'Open an H2 shell connected to the lab database'
    classpath = sourceSets.main.runtimeClasspath
    mainClass = 'org.h2.tools.Shell'
    args '-url', 'jdbc:h2:file:./data/h2/ordersdb',
         '-user', 'sa',
         '-password', ''
    standardInput = System.in
}

This build gives the project four important capabilities:

Part	Role
`spring-boot-starter-batch`	Adds Spring Batch and Boot’s Batch auto-configuration
`spring-boot-starter-jdbc`	Adds JDBC and transaction infrastructure
`h2`	Provides the local lab database
`h2Shell` task	Lets you inspect the file-based H2 database after the app exits

Build once to check the project:

./gradlew clean build

Expected result:

BUILD SUCCESSFUL

Use or rename the generated Spring Boot application class as BatchOrdersApplication when you reach the application class step. Do not keep two @SpringBootApplication classes in the project.

4. Configure H2, Startup Execution, and Reset Behavior

Create src/main/resources/application.yml:

spring:
  main:
    web-application-type: none
 
  datasource:
    url: jdbc:h2:file:./data/h2/ordersdb
    username: sa
    password:
    driver-class-name: org.h2.Driver
 
  sql:
    init:
      mode: always
 
  batch:
    job:
      enabled: true
      name: importOrdersJob
    jdbc:
      initialize-schema: never
 
logging:
  level:
    org.springframework.batch: INFO

This configuration does several separate things.

First, the app is non-web:

spring:
  main:
    web-application-type: none

A batch application does not need to behave like a web server. In this lab, the process should start, run the job, and exit.

Second, H2 is file-based:

jdbc:h2:file:./data/h2/ordersdb

This lets you inspect tables after the Java process exits.

Third, normal SQL initialization is enabled:

spring:
  sql:
    init:
      mode: always

This causes Spring Boot to run schema.sql on startup.

In this lab, schema.sql is responsible only for the business table:

processed_orders

It does not create or reset Spring Batch metadata tables.

Fourth, Batch metadata initialization is disabled by default:

spring:
  batch:
    jdbc:
      initialize-schema: never

This is deliberate.

Because the H2 database is file-based, the BATCH_* metadata tables can remain after the first run. If the application tries to initialize the Batch metadata schema on every run, it may attempt to create tables that already exist.

So the lab uses this strategy:

Run type	What to do
First run after a full lab reset	Temporarily initialize Batch metadata with a Spring Boot command-line property
Later runs	Do not initialize Batch metadata again
Full reset	Delete `data/h2`

The first run after a full reset will use this command:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true

Later runs will use commands like this:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,true

Notice the two different kinds of command-line argument:

Argument	Kind	Uses `--`?	Role
`--spring.batch.jdbc.initialize-schema=always`	Spring Boot application property	Yes	Temporarily tells Boot to initialize Spring Batch metadata tables
`run.id=1,java.lang.Long,true`	Spring Batch job parameter	No	Identifies this job run as `run.id=1`

This distinction is important:

Boot properties use --
Batch job parameters do not use --

Lab reset strategy

There are two reset levels.

Business table reset

The business table is reset by schema.sql on every application startup.

That means every successful run writes a fresh set of processed orders.

This keeps the main lab deterministic.

Batch metadata persistence

Spring Batch metadata is not reset by schema.sql.

The BATCH_* tables remain in the file-based H2 database, so you can see multiple job instances after running with different run.id values.

Full lab reset

To reset both business data and Batch metadata, delete the H2 files:

rm -rf data/h2
mkdir -p data/h2

PowerShell equivalent:

Remove-Item -Recurse -Force data\h2 -ErrorAction SilentlyContinue
New-Item -ItemType Directory -Force data\h2

A full lab reset deletes both:

business data
Spring Batch metadata

A normal application run resets only:

business data in processed_orders

This separation is deliberate.

5. Add the Business Table and Input CSV

Create src/main/resources/schema.sql:

DROP TABLE IF EXISTS processed_orders;
 
CREATE TABLE processed_orders (
    order_id VARCHAR(50) NOT NULL PRIMARY KEY,
    customer_id VARCHAR(50) NOT NULL,
    amount DECIMAL(19, 2) NOT NULL,
    currency VARCHAR(3) NOT NULL,
    order_tier VARCHAR(20) NOT NULL,
    processed_at TIMESTAMP NOT NULL
);

This file creates the business table:

processed_orders

It does not create Spring Batch metadata tables.

It does not drop BATCH_JOB_INSTANCE.

It does not drop BATCH_JOB_EXECUTION.

It does not drop BATCH_STEP_EXECUTION.

That means this file is responsible for the application’s domain output, not Spring Batch’s execution history.

The business table columns are:

order_id
customer_id
amount
currency
order_tier
processed_at

The input CSV will not contain order_tier or processed_at. Those values are created during processing.

Create src/main/resources/orders.csv:

order_id,customer_id,amount,currency
O-1001,C-001,249.99,USD
O-1002,C-002,1000.00,USD
O-1003,C-003,1575.50,USD
O-1004,C-001,35.00,EUR
O-1005,C-004,999.99,USD
O-1006,C-005,2500.00,KRW

Before writing Java code, predict the classifications:

`order_id`	`amount`	Expected `order_tier`
`O-1001`	`249.99`	`STANDARD`
`O-1002`	`1000.00`	`HIGH_VALUE`
`O-1003`	`1575.50`	`HIGH_VALUE`
`O-1004`	`35.00`	`STANDARD`
`O-1005`	`999.99`	`STANDARD`
`O-1006`	`2500.00`	`HIGH_VALUE`

The boundary case is 1000.00.

Because the rule is amount >= 1000, an order with amount exactly 1000.00 is HIGH_VALUE.

6. Add the Application Class and Row Models

Use or rename the generated application class as src/main/java/com/example/batchorders/BatchOrdersApplication.java:

package com.example.batchorders;
 
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
 
@SpringBootApplication
public class BatchOrdersApplication {
 
    public static void main(String[] args) {
        SpringApplication.run(BatchOrdersApplication.class, args);
    }
}

This is the Spring Boot entry point.

It does not manually open the CSV file. It does not manually launch the writer. Spring Boot starts the application context, and Boot’s Batch support launches the configured job on startup.

If Spring Initializr generated a differently named application class, rename it or replace it with this one. Do not leave a second @SpringBootApplication class in the project.

Create src/main/java/com/example/batchorders/model/OrderInput.java:

package com.example.batchorders.model;
 
import java.math.BigDecimal;
 
public record OrderInput(
        String orderId,
        String customerId,
        BigDecimal amount,
        String currency
) {
}

OrderInput is the shape of one input row.

It represents this CSV structure:

order_id,customer_id,amount,currency

The Java names use camelCase:

orderId
customerId
amount
currency

The CSV header does not automatically become Java properties. We will explicitly map fields in the reader.

Create src/main/java/com/example/batchorders/model/ProcessedOrder.java:

package com.example.batchorders.model;
 
import java.math.BigDecimal;
import java.time.LocalDateTime;
 
public record ProcessedOrder(
        String orderId,
        String customerId,
        BigDecimal amount,
        String currency,
        String orderTier,
        LocalDateTime processedAt
) {
}

ProcessedOrder is the shape of one output row.

It has everything from OrderInput, plus:

orderTier
processedAt

Those extra values are created by the processor.

Conceptually:

OrderInput
  = what arrived from the CSV
 
ProcessedOrder
  = what the batch job produced for the business table

7. Configure the Reader, Processor, Writer, Step, and Job

Create src/main/java/com/example/batchorders/batch/BatchConfig.java:

package com.example.batchorders.batch;
 
import com.example.batchorders.model.OrderInput;
import com.example.batchorders.model.ProcessedOrder;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.core.namedparam.MapSqlParameterSource;
import org.springframework.transaction.PlatformTransactionManager;
 
import javax.sql.DataSource;
import java.math.BigDecimal;
import java.sql.Timestamp;
import java.time.Clock;
import java.time.LocalDateTime;
 
@Configuration
public class BatchConfig {
 
    private static final BigDecimal HIGH_VALUE_THRESHOLD = new BigDecimal("1000");
 
    @Bean
    public Clock clock() {
        return Clock.systemUTC();
    }
 
    @Bean
    public FlatFileItemReader<OrderInput> orderReader() {
        return new FlatFileItemReaderBuilder<OrderInput>()
                .name("orderReader")
                .resource(new ClassPathResource("orders.csv"))
                .linesToSkip(1)
                .delimited()
                .names("orderId", "customerId", "amount", "currency")
                .fieldSetMapper(fieldSet -> new OrderInput(
                        fieldSet.readString("orderId"),
                        fieldSet.readString("customerId"),
                        fieldSet.readBigDecimal("amount"),
                        fieldSet.readString("currency")
                ))
                .build();
    }
 
    @Bean
    public ItemProcessor<OrderInput, ProcessedOrder> orderProcessor(Clock clock) {
        return order -> {
            String orderTier = order.amount().compareTo(HIGH_VALUE_THRESHOLD) >= 0
                    ? "HIGH_VALUE"
                    : "STANDARD";
 
            return new ProcessedOrder(
                    order.orderId(),
                    order.customerId(),
                    order.amount(),
                    order.currency(),
                    orderTier,
                    LocalDateTime.now(clock)
            );
        };
    }
 
    @Bean
    public JdbcBatchItemWriter<ProcessedOrder> orderWriter(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<ProcessedOrder>()
                .dataSource(dataSource)
                .sql("""
                        INSERT INTO processed_orders (
                            order_id,
                            customer_id,
                            amount,
                            currency,
                            order_tier,
                            processed_at
                        )
                        VALUES (
                            :orderId,
                            :customerId,
                            :amount,
                            :currency,
                            :orderTier,
                            :processedAt
                        )
                        """)
                .itemSqlParameterSourceProvider(order -> new MapSqlParameterSource()
                        .addValue("orderId", order.orderId())
                        .addValue("customerId", order.customerId())
                        .addValue("amount", order.amount())
                        .addValue("currency", order.currency())
                        .addValue("orderTier", order.orderTier())
                        .addValue("processedAt", Timestamp.valueOf(order.processedAt())))
                .build();
    }
 
    @Bean
    public Step importOrdersStep(
            JobRepository jobRepository,
            PlatformTransactionManager transactionManager,
            ItemReader<OrderInput> orderReader,
            ItemProcessor<OrderInput, ProcessedOrder> orderProcessor,
            ItemWriter<ProcessedOrder> orderWriter
    ) {
        return new StepBuilder("importOrdersStep", jobRepository)
                .<OrderInput, ProcessedOrder>chunk(3, transactionManager)
                .reader(orderReader)
                .processor(orderProcessor)
                .writer(orderWriter)
                .build();
    }
 
    @Bean
    public Job importOrdersJob(
            JobRepository jobRepository,
            Step importOrdersStep
    ) {
        return new JobBuilder("importOrdersJob", jobRepository)
                .start(importOrdersStep)
                .build();
    }
}

This one configuration class defines the whole Batch structure.

Read it in layers.

Reader checkpoint

@Bean
public FlatFileItemReader<OrderInput> orderReader() {
    return new FlatFileItemReaderBuilder<OrderInput>()
            .name("orderReader")
            .resource(new ClassPathResource("orders.csv"))
            .linesToSkip(1)
            .delimited()
            .names("orderId", "customerId", "amount", "currency")
            .fieldSetMapper(fieldSet -> new OrderInput(
                    fieldSet.readString("orderId"),
                    fieldSet.readString("customerId"),
                    fieldSet.readBigDecimal("amount"),
                    fieldSet.readString("currency")
            ))
            .build();
}

The reader owns the input boundary.

It answers:

Where is the input, and how does one raw record become one Java input item?

Important details:

resource(new ClassPathResource("orders.csv")) reads from src/main/resources/orders.csv.
linesToSkip(1) skips the CSV header.
names(...) assigns field names used by the mapper.
fieldSetMapper(...) creates an OrderInput.

The reader does not classify the amount. The reader does not write to the database.

Processor checkpoint

@Bean
public ItemProcessor<OrderInput, ProcessedOrder> orderProcessor(Clock clock) {
    return order -> {
        String orderTier = order.amount().compareTo(HIGH_VALUE_THRESHOLD) >= 0
                ? "HIGH_VALUE"
                : "STANDARD";
 
        return new ProcessedOrder(
                order.orderId(),
                order.customerId(),
                order.amount(),
                order.currency(),
                orderTier,
                LocalDateTime.now(clock)
        );
    };
}

The processor owns item transformation.

It answers:

Given one valid input item, what business output item should be produced?

The processor:

receives one OrderInput
compares amount with 1000
creates orderTier
adds processedAt
returns one ProcessedOrder

The processor does not know where the CSV file is. It does not execute SQL.

Writer checkpoint

@Bean
public JdbcBatchItemWriter<ProcessedOrder> orderWriter(DataSource dataSource) {
    return new JdbcBatchItemWriterBuilder<ProcessedOrder>()
            .dataSource(dataSource)
            .sql("""
                    INSERT INTO processed_orders (
                        order_id,
                        customer_id,
                        amount,
                        currency,
                        order_tier,
                        processed_at
                    )
                    VALUES (
                        :orderId,
                        :customerId,
                        :amount,
                        :currency,
                        :orderTier,
                        :processedAt
                    )
                    """)
            .itemSqlParameterSourceProvider(order -> new MapSqlParameterSource()
                    .addValue("orderId", order.orderId())
                    .addValue("customerId", order.customerId())
                    .addValue("amount", order.amount())
                    .addValue("currency", order.currency())
                    .addValue("orderTier", order.orderTier())
                    .addValue("processedAt", Timestamp.valueOf(order.processedAt())))
            .build();
}

The writer owns the output boundary.

It answers:

Where do processed items go?

In this lab, processed items go into:

processed_orders

The named parameters in the SQL refer to ProcessedOrder values:

SQL parameter	Comes from
`:orderId`	`ProcessedOrder.orderId()`
`:customerId`	`ProcessedOrder.customerId()`
`:amount`	`ProcessedOrder.amount()`
`:currency`	`ProcessedOrder.currency()`
`:orderTier`	`ProcessedOrder.orderTier()`
`:processedAt`	`ProcessedOrder.processedAt()`

This writer inserts business rows.

It does not insert into BATCH_JOB_INSTANCE.

It does not insert into BATCH_STEP_EXECUTION.

Those metadata writes belong to JobRepository.

Step checkpoint

@Bean
public Step importOrdersStep(
        JobRepository jobRepository,
        PlatformTransactionManager transactionManager,
        ItemReader<OrderInput> orderReader,
        ItemProcessor<OrderInput, ProcessedOrder> orderProcessor,
        ItemWriter<ProcessedOrder> orderWriter
) {
    return new StepBuilder("importOrdersStep", jobRepository)
            .<OrderInput, ProcessedOrder>chunk(3, transactionManager)
            .reader(orderReader)
            .processor(orderProcessor)
            .writer(orderWriter)
            .build();
}

The reader, processor, and writer do not run by themselves. They become executable batch work when connected inside a Step.

This is current Spring Batch 5 style.

Notice what it does not use:

JobBuilderFactory
StepBuilderFactory

Older tutorials often use those factory classes. This lab does not.

The key line is:

.<OrderInput, ProcessedOrder>chunk(3, transactionManager)

This means:

Read and process up to 3 items, then write those processed items as one chunk inside a transaction boundary.

Chunk size is not the total number of rows in the job.

Chunk size is the grouping size for processing, writing, and committing.

With 6 input rows and chunk size 3, the step should write two chunks:

chunk 1: rows 1, 2, 3
chunk 2: rows 4, 5, 6

The JobRepository records step execution metadata, such as read and write counts. The PlatformTransactionManager controls the transaction around chunk work.

Job checkpoint

@Bean
public Job importOrdersJob(
        JobRepository jobRepository,
        Step importOrdersStep
) {
    return new JobBuilder("importOrdersJob", jobRepository)
            .start(importOrdersStep)
            .build();
}

The job is the top-level workflow:

importOrdersJob
    `-- importOrdersStep
        |-- orderReader
        |-- orderProcessor
        `-- orderWriter

This lab has only one step, but Job and Step are still different concepts.

The job owns the top-level identity. The step owns one executable phase of work.

8. Build and Run the First Job Instance

Build the jar:

./gradlew clean bootJar

Expected result:

BUILD SUCCESSFUL

If your project name or version differs, the jar name may differ. With the build.gradle shown above, the jar path is:

build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar

Because this is the first run after creating or resetting the file-based H2 database, initialize the Spring Batch metadata schema once:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true

This command contains two different kinds of arguments:

--spring.batch.jdbc.initialize-schema=always

is a Spring Boot application property.

It temporarily overrides the default in application.yml and tells Boot to create the Spring Batch metadata tables.

run.id=1,java.lang.Long,true

is a Spring Batch job parameter.

It identifies this job instance as run.id=1.

The Batch job parameter format is:

key=value,type,identifying

So this:

run.id=1,java.lang.Long,true

means:

Part	Meaning
`run.id`	parameter name
`1`	parameter value
`java.lang.Long`	parameter type
`true`	identifying parameter

Because run.id is identifying, it participates in JobInstance identity.

Do not write the job parameter as:

--run.id=1

That would look like a Spring Boot application property. In this lab, Batch job parameters do not use --.

A successful run should include logs similar to:

Job: [SimpleJob: [name=importOrdersJob]] launched
Executing step: [importOrdersStep]
Step: [importOrdersStep] executed
Job: [SimpleJob: [name=importOrdersJob]] completed

The exact timestamps and thread names will differ.

The application exits after the job completes. That is expected.

This app is not a web server. It starts, runs one batch job, records metadata, writes business output, and exits.

Optional `bootRun` form

The jar command above is the canonical command for this lab.

If you use Gradle bootRun, quote the argument string so Gradle passes it correctly:

./gradlew bootRun --args='--spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true'

PowerShell usually uses double quotes:

./gradlew bootRun --args="--spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true"

Inside the quoted string, the same distinction still holds:

--spring.batch.jdbc.initialize-schema=always
  = Boot property
 
run.id=1,java.lang.Long,true
  = Batch job parameter

9. Inspect the Business Data and Batch Metadata

The application has exited, but the H2 database is file-based, so you can inspect it.

Open the H2 shell:

./gradlew -q h2Shell

You should get an interactive prompt like:

sql>

Check the business table

Run:

SELECT COUNT(*) AS processed_order_count
FROM processed_orders;

Expected count:

Then inspect the rows:

SELECT order_id, customer_id, amount, currency, order_tier, processed_at
FROM processed_orders
ORDER BY order_id;

Expected business result:

`order_id`	`amount`	`order_tier`
`O-1001`	`249.99`	`STANDARD`
`O-1002`	`1000.00`	`HIGH_VALUE`
`O-1003`	`1575.50`	`HIGH_VALUE`
`O-1004`	`35.00`	`STANDARD`
`O-1005`	`999.99`	`STANDARD`
`O-1006`	`2500.00`	`HIGH_VALUE`

This proves that the reader, processor, and writer pipeline worked:

CSV row -> OrderInput -> ProcessedOrder -> processed_orders row

These are business rows. They are not Batch metadata rows.

Check the Batch metadata tables

Now inspect the metadata tables.

Run:

SELECT JOB_INSTANCE_ID, JOB_NAME, JOB_KEY
FROM BATCH_JOB_INSTANCE;

This table records logical job instances.

Then run:

SELECT JOB_EXECUTION_ID, JOB_INSTANCE_ID, STATUS, START_TIME, END_TIME
FROM BATCH_JOB_EXECUTION
ORDER BY JOB_EXECUTION_ID;

This table records actual executions of job instances.

Then run:

SELECT STEP_EXECUTION_ID, STEP_NAME, STATUS, READ_COUNT, WRITE_COUNT
FROM BATCH_STEP_EXECUTION
ORDER BY STEP_EXECUTION_ID;

Expected important values:

STEP_NAME   = importOrdersStep
STATUS      = COMPLETED
READ_COUNT  = 6
WRITE_COUNT = 6

Finally, inspect job parameters:

SELECT JOB_EXECUTION_ID, PARAMETER_NAME, PARAMETER_TYPE, PARAMETER_VALUE, IDENTIFYING
FROM BATCH_JOB_EXECUTION_PARAMS
ORDER BY JOB_EXECUTION_ID, PARAMETER_NAME;

You should see run.id.

The important distinction is:

processed_orders
  = business result
 
BATCH_* tables
  = Spring Batch execution metadata

READ_COUNT = 6 does not mean Spring Batch stored your six orders in BATCH_STEP_EXECUTION.

It means the step execution metadata says:

This step read six items.

The actual processed order rows are in processed_orders.

Exit H2 shell:

quit

10. Rerun with a Different `JobParameter`

For later runs, do not initialize the Batch metadata schema again.

Run the application with a different identifying parameter:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,true

This creates a different JobInstance because the identifying parameter changed.

The identity rule is:

JobInstance = Job name + identifying JobParameters

So these are different:

importOrdersJob + run.id=1
importOrdersJob + run.id=2

Open H2 shell again:

./gradlew -q h2Shell

Check business rows:

SELECT COUNT(*) AS processed_order_count
FROM processed_orders;

Expected:

Why not 12?

Because this lab resets only the business table on each application startup through schema.sql.

Now check metadata:

SELECT JOB_INSTANCE_ID, JOB_NAME
FROM BATCH_JOB_INSTANCE
ORDER BY JOB_INSTANCE_ID;

You should now see more than one job instance.

Check job parameters:

SELECT JOB_EXECUTION_ID, PARAMETER_NAME, PARAMETER_TYPE, PARAMETER_VALUE, IDENTIFYING
FROM BATCH_JOB_EXECUTION_PARAMS
ORDER BY JOB_EXECUTION_ID, PARAMETER_NAME;

You should see both parameter values across executions:

run.id = 1
run.id = 2

This is the core rerun lesson:

Changing run.id changes the Batch job identity. It does not mean the business table itself manages duplicate rows.

In this lab, duplicate business rows are avoided by resetting processed_orders at startup. That is a lab simplification. In a production design, you would make an explicit business decision about whether writes should insert, update, merge, ignore duplicates, or fail.

What if you rerun `run.id=1`?

If you run this again:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=1,java.lang.Long,true

Spring Batch may reject the run because the job instance identified by importOrdersJob + run.id=1 has already completed.

That behavior comes from Batch metadata, not from the business table.

Be careful with this lab’s reset design: schema.sql resets the business table at app startup before the job attempts to run. If you intentionally test a rejected rerun, run again afterward with a new run.id before inspecting business rows.

The teaching point is the separation:

Question	Controlled by
”Has this job instance already completed?”	Spring Batch metadata in `BATCH_*` tables
”Will business rows be duplicated?”	Business schema and writer strategy
”What does this lab reset automatically?”	`processed_orders`, not `BATCH_*` metadata
”When do I recreate Batch metadata tables?”	Only after a full lab reset, using `--spring.batch.jdbc.initialize-schema=always`

11. Trace the Execution Model

Now that the lab has run, connect the pieces.

The application starts here:

BatchOrdersApplication.main()

Then Spring Boot creates the application context.

Because the app has one configured job named importOrdersJob, Boot launches that job on startup.

The execution flow is:

1. Terminal starts the Spring Boot jar.
2. Spring Boot creates the application context.
3. Spring Batch infrastructure is available.
4. Boot launches importOrdersJob with JobParameters.
5. JobRepository creates or finds the JobInstance.
6. JobRepository records a JobExecution.
7. importOrdersJob starts importOrdersStep.
8. importOrdersStep opens the CSV reader.
9. The reader reads one OrderInput at a time.
10. The processor converts each OrderInput into a ProcessedOrder.
11. The step groups processed items into chunks of 3.
12. JdbcBatchItemWriter inserts each chunk into processed_orders.
13. JobRepository records StepExecution counts and final status.
14. The job completes.
15. The Spring Boot process exits.

Separate the responsibilities:

Component	Responsibility in this lab
`BatchOrdersApplication`	Starts Spring Boot
`importOrdersJob`	Defines the top-level workflow
`importOrdersStep`	Coordinates chunk-oriented work
`orderReader`	Reads `orders.csv` into `OrderInput`
`orderProcessor`	Converts `OrderInput` into `ProcessedOrder`
`orderWriter`	Inserts `ProcessedOrder` rows into `processed_orders`
`JobRepository`	Stores metadata in `BATCH_*` tables
`JobParameters`	Identify the logical job instance

The shortest accurate mental model is:

Job
  contains Step
    coordinates Reader -> Processor -> Writer
 
JobRepository
  stores execution metadata
 
Business table
  stores domain output

12. What This Lab Deliberately Leaves Out

This first lab stops at the core model.

It does not teach:

skip/retry
restartability after failure
listeners
scheduling
partitioning
parallel processing
production deployment
multiple jobs
multiple data sources

Those are important Spring Batch topics, but they are easier to understand after the first structure is clear:

Job
  contains Step
    coordinates Reader -> Processor -> Writer
 
JobRepository
  stores execution metadata
 
Business tables
  store domain output

Follow-up note: `RunIdIncrementer`

Many examples use RunIdIncrementer to generate or increment a run.id parameter.

That is useful, but this lab avoids it in the main path because explicit commands make the identity rule easier to see:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,true

Once you understand that JobInstance identity depends on identifying JobParameters, RunIdIncrementer becomes a convenience, not magic.

Optional extension: export processed rows to CSV

After the main DB-writing lab, a good extension is:

processed_orders table
        |
        v
database reader
        |
        v
optional processor
        |
        v
FlatFileItemWriter
        |
        v
build/output/processed-orders.csv

Do not add this to the main path yet.

The main lab’s purpose is to understand one complete CSV-to-database batch job. CSV export is a good second lab because it shows that changing the output destination changes the writer, but not the basic Spring Batch structure.

Learning Outcomes Checklist

After completing this lab, you should be able to answer these questions.

What is the difference between a `Job` and a `Step`?

A Job is the named top-level batch workflow. A Step is one executable phase inside that workflow.

In this lab:

Job:  importOrdersJob
Step: importOrdersStep

The job owns the overall workflow identity. The step owns the chunk-oriented work.

Why does chunk-oriented processing need a reader, processor, and writer?

Because chunk-oriented processing separates three responsibilities:

ItemReader
  reads input items
 
ItemProcessor
  transforms input items into output items
 
ItemWriter
  writes output items

In this lab:

FlatFileItemReader<OrderInput>
  reads CSV rows
 
ItemProcessor<OrderInput, ProcessedOrder>
  classifies orders
 
JdbcBatchItemWriter<ProcessedOrder>
  inserts rows into processed_orders

What does `JobRepository` store?

JobRepository stores Spring Batch execution metadata.

Examples:

job instance identity
job execution status
job parameters
step execution status
read count
write count
start and end times

It does not store the processed order business rows.

Why are Batch metadata tables different from business data tables?

Because they answer different questions.

processed_orders answers:

What business data did the job produce?

BATCH_* tables answer:

What batch job ran, with what parameters, through which steps, and with what status?

They may live in the same database during the lab, but they have different owners and meanings.

Why can a Batch app start, run, and exit?

Because a Spring Batch app does not have to be a web server.

In this lab:

Spring Boot starts
  -> application context is created
  -> the Job runs on startup
  -> the Step reads, processes, and writes
  -> metadata is recorded
  -> the Job completes
  -> the process exits

Exiting after completion is success, not failure.

Why do `JobParameters` affect reruns?

Because identifying JobParameters participate in JobInstance identity.

Simplified rule:

JobInstance = Job name + identifying JobParameters

So:

importOrdersJob + run.id=1

and:

importOrdersJob + run.id=2

are different job instances.

Changing run.id changes Batch identity. It does not automatically decide how your business table handles duplicate rows.

Version-Risk Checklist

Use this checklist when comparing this lab with older tutorials.

Java and Spring versions

This lab targets Java 17+.
This lab targets Spring Boot 3.5.x.
This lab targets Spring Batch 5.2.x.
Do not silently move the article to Spring Boot 4.x or Spring Batch 6.x.

Builder APIs

Avoid old examples that use:

JobBuilderFactory
StepBuilderFactory

This lab uses current direct builders:

new JobBuilder("importOrdersJob", jobRepository)
new StepBuilder("importOrdersStep", jobRepository)

Chunk configuration

Avoid old examples that call:

chunk(100)

This lab uses the Spring Batch 5 style:

chunk(3, transactionManager)

`@EnableBatchProcessing`

Do not add @EnableBatchProcessing to this beginner lab.

Spring Boot can auto-configure Batch infrastructure. Adding @EnableBatchProcessing changes the auto-configuration path and can interfere with Boot’s Batch schema initialization behavior.

Job parameters

Batch job parameters should be passed without --:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,true

Spring Boot application properties use --:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true

Do not confuse those two channels.

H2 and metadata initialization

This lab uses file-based H2 so you can inspect tables after the app exits.

The reset behavior is deliberate:

Reset type	How
Business table reset	`schema.sql` drops and recreates `processed_orders` on startup
Batch metadata initialization	Default is `spring.batch.jdbc.initialize-schema: never`
First run after full reset	Use `--spring.batch.jdbc.initialize-schema=always` once
Later runs	Do not initialize Batch metadata again
Full lab reset	Delete `data/h2`

Row mapping

This lab uses Java records for row models, but avoids fragile automatic bean mapping for CSV input by using an explicit fieldSetMapper.

That keeps the CSV-to-Java mapping visible:

order_id    -> orderId
customer_id -> customerId
amount      -> amount
currency    -> currency

Common Beginner Confusions This Lab Prevents

”Spring Batch metadata is my business data.”

No.

Batch metadata records that a job ran. Business tables store what the job produced.

”`JobRepository` stores my processed orders.”

No.

JobRepository stores job and step execution metadata. The JdbcBatchItemWriter writes processed orders into processed_orders.

”A `Job` reads the CSV.”

Not directly.

The job defines the workflow. The reader reads the CSV inside a step.

”A `Step` is the same thing as a processor.”

No.

The step coordinates the chunk flow. The processor transforms one item at a time.

”Chunk size means total row count.”

No.

Chunk size means how many processed items are grouped before writing and committing.

”If the app exits, something went wrong.”

No.

A non-web Batch app is expected to exit after the job completes.

”Changing `run.id` prevents duplicate business rows.”

No.

Changing run.id changes Batch job identity. Duplicate business rows are controlled by your business schema and writer strategy.

”`schema.sql` resets everything.”

No.

In this lab, schema.sql resets only the business table processed_orders.

It does not reset Spring Batch metadata tables.

”All command-line arguments use `--`.”

No.

Spring Boot application properties use --.

Spring Batch job parameters in this lab do not.

Correct:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true

Incorrect:

java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --run.id=1

Final Mental Model

The whole lab can be compressed into this structure:

Spring Boot application
  starts a Spring context
  auto-runs one Batch Job
 
Job: importOrdersJob
  top-level workflow identity
 
Step: importOrdersStep
  chunk-oriented execution phase
 
Reader
  orders.csv -> OrderInput
 
Processor
  OrderInput -> ProcessedOrder
 
Writer
  ProcessedOrder -> processed_orders
 
JobRepository
  execution metadata -> BATCH_* tables

If you can explain that structure without mixing up processed_orders and BATCH_*, you have the first durable Spring Batch mental model.

References

These official references were used to keep the lab aligned with Spring Boot 3.5.x and Spring Batch 5.2.x.

Insight Vault

Browse

Spring Batch from First Principles: Build a CSV-to-H2 Order Processing Job

1. What This Lab Builds

2. The Core Vocabulary Before Code

Job

Step

ItemReader

ItemProcessor

ItemWriter

JobRepository

JobParameters

3. Create the Project with Spring Initializr

4. Configure H2, Startup Execution, and Reset Behavior

Lab reset strategy

Business table reset

Batch metadata persistence

Full lab reset

5. Add the Business Table and Input CSV

6. Add the Application Class and Row Models

7. Configure the Reader, Processor, Writer, Step, and Job

Reader checkpoint

Processor checkpoint

Writer checkpoint

Step checkpoint

Job checkpoint

8. Build and Run the First Job Instance

Optional bootRun form

9. Inspect the Business Data and Batch Metadata

Check the business table

Check the Batch metadata tables

10. Rerun with a Different JobParameter

What if you rerun run.id=1?

11. Trace the Execution Model

12. What This Lab Deliberately Leaves Out

Follow-up note: RunIdIncrementer

Optional extension: export processed rows to CSV

Learning Outcomes Checklist

What is the difference between a Job and a Step?

Why does chunk-oriented processing need a reader, processor, and writer?

What does JobRepository store?

Why are Batch metadata tables different from business data tables?

Why can a Batch app start, run, and exit?

Why do JobParameters affect reruns?

Version-Risk Checklist

Java and Spring versions

Builder APIs

Chunk configuration

@EnableBatchProcessing

Job parameters

H2 and metadata initialization

Row mapping

Common Beginner Confusions This Lab Prevents

”Spring Batch metadata is my business data.”

”JobRepository stores my processed orders.”

”A Job reads the CSV.”

”A Step is the same thing as a processor.”

”Chunk size means total row count.”

”If the app exits, something went wrong.”

”Changing run.id prevents duplicate business rows.”

”schema.sql resets everything.”

”All command-line arguments use --.”

Final Mental Model

References

Graph View

Table of Contents

`Job`

`Step`

`ItemReader`

`ItemProcessor`

`ItemWriter`

`JobRepository`

`JobParameters`

Optional `bootRun` form

10. Rerun with a Different `JobParameter`

What if you rerun `run.id=1`?

Follow-up note: `RunIdIncrementer`

What is the difference between a `Job` and a `Step`?

What does `JobRepository` store?

Why do `JobParameters` affect reruns?

`@EnableBatchProcessing`

”`JobRepository` stores my processed orders.”

”A `Job` reads the CSV.”

”A `Step` is the same thing as a processor.”

”Changing `run.id` prevents duplicate business rows.”

”`schema.sql` resets everything.”

”All command-line arguments use `--`.”