This learning lab is for a reader who knows basic Java and Spring Boot but has never used Spring Batch. The goal is not only to make a batch job run, but to understand the core Spring Batch structure through a small runnable project: a Job containing a chunk-oriented Step that reads orders from a CSV file, classifies each order, writes processed rows into an H2 business table, and records execution metadata in separate Spring Batch metadata tables.
This article is pinned to:
- Java 17+
- Spring Boot 3.5.x
- Spring Batch 5.2.x
- Gradle
- H2
- Java configuration style
The examples use Spring Boot 3.5.15, which manages Spring Batch 5.2.6. Stay within the Spring Boot 3.5.x and Spring Batch 5.2.x line for this lab. Do not upgrade the article to Spring Boot 4.x or Spring Batch 6.x without revisiting the version-risk notes near the end.
1. What This Lab Builds
A Spring Batch application is a Spring Boot application that runs batch work in a structured way. The top-level unit is a Job. A job contains one or more Steps. In a chunk-oriented step, Spring Batch repeatedly reads items, optionally processes them, groups them into chunks, and writes each chunk.
In this lab, you will build this flow:
src/main/resources/orders.csv
|
v
ItemReader<OrderInput>
|
v
ItemProcessor<OrderInput, ProcessedOrder>
|
v
JdbcBatchItemWriter<ProcessedOrder>
|
v
processed_orders business tableAt the same time, Spring Batch records execution metadata:
JobRepository
|
v
BATCH_* metadata tablesThese two paths must stay separate in your mental model.
| Area | Table examples | Written by | Meaning |
|---|---|---|---|
| Business data | processed_orders | JdbcBatchItemWriter | The actual processed order rows produced by this lab |
| Spring Batch metadata | BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_STEP_EXECUTION, BATCH_JOB_EXECUTION_PARAMS | JobRepository | Infrastructure records that describe which jobs and steps ran, with what parameters and status |
The most important distinction in this lab is:
JobRepositorystores metadata about the batch execution. It does not store your processed orders.
The processed orders go into the business table. The Batch execution history goes into BATCH_* tables.
2. The Core Vocabulary Before Code
Job
A Job is the named top-level batch workflow.
In this lab:
Job: importOrdersJobIts role is to define the overall batch process and connect that process to Spring Batch’s execution model.
A Job does not read CSV rows directly. It does not classify order amounts directly. It does not insert business rows directly. Those responsibilities belong to components inside the job.
Step
A Step is one executable phase inside a job.
In this lab:
Step: importOrdersStepThis step is chunk-oriented. It connects:
reader -> processor -> writerThe step is where we say:
Read items, process them, group them into chunks, and write each chunk transactionally.
ItemReader
An ItemReader<I> reads one input item at a time.
In this lab:
ItemReader<OrderInput>It reads rows from orders.csv and turns each CSV row into an OrderInput.
Its boundary:
- It knows how to read the input.
- It should not classify the order.
- It should not insert rows into the database.
ItemProcessor
An ItemProcessor<I, O> transforms one item into another item.
In this lab:
ItemProcessor<OrderInput, ProcessedOrder>It receives one OrderInput, applies the classification rule, and returns one ProcessedOrder.
Classification rule:
amount >= 1000 -> HIGH_VALUE
otherwise -> STANDARDIts boundary:
- It owns business transformation.
- It should not know how to read the CSV file.
- It should not know how to write SQL.
ItemWriter
An ItemWriter<O> writes processed items.
In this lab:
JdbcBatchItemWriter<ProcessedOrder>It receives chunks of processed orders and inserts them into the business table:
processed_ordersIts boundary:
- It writes business output.
- It does not create
JobInstancerecords. - It does not write Batch metadata tables.
JobRepository
JobRepository is Spring Batch’s metadata store.
It records things like:
- job instances
- job executions
- job parameters
- step executions
- status
- read count
- write count
- start and end times
Its boundary:
- It stores Batch execution metadata.
- It does not store the processed order rows.
This distinction matters because both the business table and the Batch metadata tables can live in the same H2 database during the lab. Same database does not mean same responsibility.
JobParameters
JobParameters are values passed to a job run.
They matter because Spring Batch uses identifying job parameters as part of JobInstance identity.
The simplified rule is:
JobInstance = Job name + identifying JobParametersSo these two runs are different logical job instances:
importOrdersJob + run.id=1
importOrdersJob + run.id=2This lab uses explicit job parameters instead of RunIdIncrementer so that the identity rule stays visible.
3. Create the Project with Spring Initializr
Use Spring Initializr as the primary project creation path.
Create a project with these settings:
| Setting | Value |
|---|---|
| Project | Gradle |
| Language | Java |
| Spring Boot | 3.5.x |
| Group | com.example |
| Artifact | spring-batch-orders-lab |
| Name | spring-batch-orders-lab |
| Package name | com.example.batchorders |
| Packaging | Jar |
| Java | 17 or newer |
Add these dependencies:
- Spring Batch
- JDBC API
- H2 Database
Download the project, unzip it, and enter the project directory:
cd spring-batch-orders-labThe final project tree will look like this:
spring-batch-orders-lab/
|-- .gitignore
|-- build.gradle
|-- settings.gradle
|-- data/
| `-- h2/
| `-- ...
`-- src/
`-- main/
|-- java/
| `-- com/
| `-- example/
| `-- batchorders/
| |-- BatchOrdersApplication.java
| |-- batch/
| | `-- BatchConfig.java
| `-- model/
| |-- OrderInput.java
| `-- ProcessedOrder.java
`-- resources/
|-- application.yml
|-- orders.csv
`-- schema.sqlCreate the missing directories:
mkdir -p src/main/java/com/example/batchorders/batch
mkdir -p src/main/java/com/example/batchorders/model
mkdir -p data/h2Update .gitignore so local database files are not committed:
.gradle/
build/
data/The data/ directory is local lab state. It contains generated H2 database files, not source code.
Your generated settings.gradle should be equivalent to:
rootProject.name = 'spring-batch-orders-lab'Your generated build.gradle may differ slightly depending on Spring Initializr, but for this lab it should be equivalent to:
plugins {
id 'java'
id 'org.springframework.boot' version '3.5.15'
id 'io.spring.dependency-management' version '1.1.7'
}
group = 'com.example'
version = '0.0.1-SNAPSHOT'
java {
toolchain {
languageVersion = JavaLanguageVersion.of(17)
}
}
repositories {
mavenCentral()
}
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-batch'
implementation 'org.springframework.boot:spring-boot-starter-jdbc'
runtimeOnly 'com.h2database:h2'
testImplementation 'org.springframework.boot:spring-boot-starter-test'
testImplementation 'org.springframework.batch:spring-batch-test'
}
tasks.named('test') {
useJUnitPlatform()
}
tasks.register('h2Shell', JavaExec) {
group = 'verification'
description = 'Open an H2 shell connected to the lab database'
classpath = sourceSets.main.runtimeClasspath
mainClass = 'org.h2.tools.Shell'
args '-url', 'jdbc:h2:file:./data/h2/ordersdb',
'-user', 'sa',
'-password', ''
standardInput = System.in
}This build gives the project four important capabilities:
| Part | Role |
|---|---|
spring-boot-starter-batch | Adds Spring Batch and Boot’s Batch auto-configuration |
spring-boot-starter-jdbc | Adds JDBC and transaction infrastructure |
h2 | Provides the local lab database |
h2Shell task | Lets you inspect the file-based H2 database after the app exits |
Build once to check the project:
./gradlew clean buildExpected result:
BUILD SUCCESSFULUse or rename the generated Spring Boot application class as BatchOrdersApplication when you reach the application class step. Do not keep two @SpringBootApplication classes in the project.
4. Configure H2, Startup Execution, and Reset Behavior
Create src/main/resources/application.yml:
spring:
main:
web-application-type: none
datasource:
url: jdbc:h2:file:./data/h2/ordersdb
username: sa
password:
driver-class-name: org.h2.Driver
sql:
init:
mode: always
batch:
job:
enabled: true
name: importOrdersJob
jdbc:
initialize-schema: never
logging:
level:
org.springframework.batch: INFOThis configuration does several separate things.
First, the app is non-web:
spring:
main:
web-application-type: noneA batch application does not need to behave like a web server. In this lab, the process should start, run the job, and exit.
Second, H2 is file-based:
jdbc:h2:file:./data/h2/ordersdbThis lets you inspect tables after the Java process exits.
Third, normal SQL initialization is enabled:
spring:
sql:
init:
mode: alwaysThis causes Spring Boot to run schema.sql on startup.
In this lab, schema.sql is responsible only for the business table:
processed_ordersIt does not create or reset Spring Batch metadata tables.
Fourth, Batch metadata initialization is disabled by default:
spring:
batch:
jdbc:
initialize-schema: neverThis is deliberate.
Because the H2 database is file-based, the BATCH_* metadata tables can remain after the first run. If the application tries to initialize the Batch metadata schema on every run, it may attempt to create tables that already exist.
So the lab uses this strategy:
| Run type | What to do |
|---|---|
| First run after a full lab reset | Temporarily initialize Batch metadata with a Spring Boot command-line property |
| Later runs | Do not initialize Batch metadata again |
| Full reset | Delete data/h2 |
The first run after a full reset will use this command:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,trueLater runs will use commands like this:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,trueNotice the two different kinds of command-line argument:
| Argument | Kind | Uses --? | Role |
|---|---|---|---|
--spring.batch.jdbc.initialize-schema=always | Spring Boot application property | Yes | Temporarily tells Boot to initialize Spring Batch metadata tables |
run.id=1,java.lang.Long,true | Spring Batch job parameter | No | Identifies this job run as run.id=1 |
This distinction is important:
Boot properties use --
Batch job parameters do not use --Lab reset strategy
There are two reset levels.
Business table reset
The business table is reset by schema.sql on every application startup.
That means every successful run writes a fresh set of processed orders.
This keeps the main lab deterministic.
Batch metadata persistence
Spring Batch metadata is not reset by schema.sql.
The BATCH_* tables remain in the file-based H2 database, so you can see multiple job instances after running with different run.id values.
Full lab reset
To reset both business data and Batch metadata, delete the H2 files:
rm -rf data/h2
mkdir -p data/h2PowerShell equivalent:
Remove-Item -Recurse -Force data\h2 -ErrorAction SilentlyContinue
New-Item -ItemType Directory -Force data\h2A full lab reset deletes both:
- business data
- Spring Batch metadata
A normal application run resets only:
- business data in
processed_orders
This separation is deliberate.
5. Add the Business Table and Input CSV
Create src/main/resources/schema.sql:
DROP TABLE IF EXISTS processed_orders;
CREATE TABLE processed_orders (
order_id VARCHAR(50) NOT NULL PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL,
amount DECIMAL(19, 2) NOT NULL,
currency VARCHAR(3) NOT NULL,
order_tier VARCHAR(20) NOT NULL,
processed_at TIMESTAMP NOT NULL
);This file creates the business table:
processed_ordersIt does not create Spring Batch metadata tables.
It does not drop BATCH_JOB_INSTANCE.
It does not drop BATCH_JOB_EXECUTION.
It does not drop BATCH_STEP_EXECUTION.
That means this file is responsible for the application’s domain output, not Spring Batch’s execution history.
The business table columns are:
order_id
customer_id
amount
currency
order_tier
processed_atThe input CSV will not contain order_tier or processed_at. Those values are created during processing.
Create src/main/resources/orders.csv:
order_id,customer_id,amount,currency
O-1001,C-001,249.99,USD
O-1002,C-002,1000.00,USD
O-1003,C-003,1575.50,USD
O-1004,C-001,35.00,EUR
O-1005,C-004,999.99,USD
O-1006,C-005,2500.00,KRWBefore writing Java code, predict the classifications:
order_id | amount | Expected order_tier |
|---|---|---|
O-1001 | 249.99 | STANDARD |
O-1002 | 1000.00 | HIGH_VALUE |
O-1003 | 1575.50 | HIGH_VALUE |
O-1004 | 35.00 | STANDARD |
O-1005 | 999.99 | STANDARD |
O-1006 | 2500.00 | HIGH_VALUE |
The boundary case is 1000.00.
Because the rule is amount >= 1000, an order with amount exactly 1000.00 is HIGH_VALUE.
6. Add the Application Class and Row Models
Use or rename the generated application class as src/main/java/com/example/batchorders/BatchOrdersApplication.java:
package com.example.batchorders;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class BatchOrdersApplication {
public static void main(String[] args) {
SpringApplication.run(BatchOrdersApplication.class, args);
}
}This is the Spring Boot entry point.
It does not manually open the CSV file. It does not manually launch the writer. Spring Boot starts the application context, and Boot’s Batch support launches the configured job on startup.
If Spring Initializr generated a differently named application class, rename it or replace it with this one. Do not leave a second @SpringBootApplication class in the project.
Create src/main/java/com/example/batchorders/model/OrderInput.java:
package com.example.batchorders.model;
import java.math.BigDecimal;
public record OrderInput(
String orderId,
String customerId,
BigDecimal amount,
String currency
) {
}OrderInput is the shape of one input row.
It represents this CSV structure:
order_id,customer_id,amount,currencyThe Java names use camelCase:
orderId
customerId
amount
currencyThe CSV header does not automatically become Java properties. We will explicitly map fields in the reader.
Create src/main/java/com/example/batchorders/model/ProcessedOrder.java:
package com.example.batchorders.model;
import java.math.BigDecimal;
import java.time.LocalDateTime;
public record ProcessedOrder(
String orderId,
String customerId,
BigDecimal amount,
String currency,
String orderTier,
LocalDateTime processedAt
) {
}ProcessedOrder is the shape of one output row.
It has everything from OrderInput, plus:
orderTier
processedAtThose extra values are created by the processor.
Conceptually:
OrderInput
= what arrived from the CSV
ProcessedOrder
= what the batch job produced for the business table7. Configure the Reader, Processor, Writer, Step, and Job
Create src/main/java/com/example/batchorders/batch/BatchConfig.java:
package com.example.batchorders.batch;
import com.example.batchorders.model.OrderInput;
import com.example.batchorders.model.ProcessedOrder;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.core.namedparam.MapSqlParameterSource;
import org.springframework.transaction.PlatformTransactionManager;
import javax.sql.DataSource;
import java.math.BigDecimal;
import java.sql.Timestamp;
import java.time.Clock;
import java.time.LocalDateTime;
@Configuration
public class BatchConfig {
private static final BigDecimal HIGH_VALUE_THRESHOLD = new BigDecimal("1000");
@Bean
public Clock clock() {
return Clock.systemUTC();
}
@Bean
public FlatFileItemReader<OrderInput> orderReader() {
return new FlatFileItemReaderBuilder<OrderInput>()
.name("orderReader")
.resource(new ClassPathResource("orders.csv"))
.linesToSkip(1)
.delimited()
.names("orderId", "customerId", "amount", "currency")
.fieldSetMapper(fieldSet -> new OrderInput(
fieldSet.readString("orderId"),
fieldSet.readString("customerId"),
fieldSet.readBigDecimal("amount"),
fieldSet.readString("currency")
))
.build();
}
@Bean
public ItemProcessor<OrderInput, ProcessedOrder> orderProcessor(Clock clock) {
return order -> {
String orderTier = order.amount().compareTo(HIGH_VALUE_THRESHOLD) >= 0
? "HIGH_VALUE"
: "STANDARD";
return new ProcessedOrder(
order.orderId(),
order.customerId(),
order.amount(),
order.currency(),
orderTier,
LocalDateTime.now(clock)
);
};
}
@Bean
public JdbcBatchItemWriter<ProcessedOrder> orderWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<ProcessedOrder>()
.dataSource(dataSource)
.sql("""
INSERT INTO processed_orders (
order_id,
customer_id,
amount,
currency,
order_tier,
processed_at
)
VALUES (
:orderId,
:customerId,
:amount,
:currency,
:orderTier,
:processedAt
)
""")
.itemSqlParameterSourceProvider(order -> new MapSqlParameterSource()
.addValue("orderId", order.orderId())
.addValue("customerId", order.customerId())
.addValue("amount", order.amount())
.addValue("currency", order.currency())
.addValue("orderTier", order.orderTier())
.addValue("processedAt", Timestamp.valueOf(order.processedAt())))
.build();
}
@Bean
public Step importOrdersStep(
JobRepository jobRepository,
PlatformTransactionManager transactionManager,
ItemReader<OrderInput> orderReader,
ItemProcessor<OrderInput, ProcessedOrder> orderProcessor,
ItemWriter<ProcessedOrder> orderWriter
) {
return new StepBuilder("importOrdersStep", jobRepository)
.<OrderInput, ProcessedOrder>chunk(3, transactionManager)
.reader(orderReader)
.processor(orderProcessor)
.writer(orderWriter)
.build();
}
@Bean
public Job importOrdersJob(
JobRepository jobRepository,
Step importOrdersStep
) {
return new JobBuilder("importOrdersJob", jobRepository)
.start(importOrdersStep)
.build();
}
}This one configuration class defines the whole Batch structure.
Read it in layers.
Reader checkpoint
@Bean
public FlatFileItemReader<OrderInput> orderReader() {
return new FlatFileItemReaderBuilder<OrderInput>()
.name("orderReader")
.resource(new ClassPathResource("orders.csv"))
.linesToSkip(1)
.delimited()
.names("orderId", "customerId", "amount", "currency")
.fieldSetMapper(fieldSet -> new OrderInput(
fieldSet.readString("orderId"),
fieldSet.readString("customerId"),
fieldSet.readBigDecimal("amount"),
fieldSet.readString("currency")
))
.build();
}The reader owns the input boundary.
It answers:
Where is the input, and how does one raw record become one Java input item?
Important details:
resource(new ClassPathResource("orders.csv"))reads fromsrc/main/resources/orders.csv.linesToSkip(1)skips the CSV header.names(...)assigns field names used by the mapper.fieldSetMapper(...)creates anOrderInput.
The reader does not classify the amount. The reader does not write to the database.
Processor checkpoint
@Bean
public ItemProcessor<OrderInput, ProcessedOrder> orderProcessor(Clock clock) {
return order -> {
String orderTier = order.amount().compareTo(HIGH_VALUE_THRESHOLD) >= 0
? "HIGH_VALUE"
: "STANDARD";
return new ProcessedOrder(
order.orderId(),
order.customerId(),
order.amount(),
order.currency(),
orderTier,
LocalDateTime.now(clock)
);
};
}The processor owns item transformation.
It answers:
Given one valid input item, what business output item should be produced?
The processor:
- receives one
OrderInput - compares
amountwith1000 - creates
orderTier - adds
processedAt - returns one
ProcessedOrder
The processor does not know where the CSV file is. It does not execute SQL.
Writer checkpoint
@Bean
public JdbcBatchItemWriter<ProcessedOrder> orderWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<ProcessedOrder>()
.dataSource(dataSource)
.sql("""
INSERT INTO processed_orders (
order_id,
customer_id,
amount,
currency,
order_tier,
processed_at
)
VALUES (
:orderId,
:customerId,
:amount,
:currency,
:orderTier,
:processedAt
)
""")
.itemSqlParameterSourceProvider(order -> new MapSqlParameterSource()
.addValue("orderId", order.orderId())
.addValue("customerId", order.customerId())
.addValue("amount", order.amount())
.addValue("currency", order.currency())
.addValue("orderTier", order.orderTier())
.addValue("processedAt", Timestamp.valueOf(order.processedAt())))
.build();
}The writer owns the output boundary.
It answers:
Where do processed items go?
In this lab, processed items go into:
processed_ordersThe named parameters in the SQL refer to ProcessedOrder values:
| SQL parameter | Comes from |
|---|---|
:orderId | ProcessedOrder.orderId() |
:customerId | ProcessedOrder.customerId() |
:amount | ProcessedOrder.amount() |
:currency | ProcessedOrder.currency() |
:orderTier | ProcessedOrder.orderTier() |
:processedAt | ProcessedOrder.processedAt() |
This writer inserts business rows.
It does not insert into BATCH_JOB_INSTANCE.
It does not insert into BATCH_STEP_EXECUTION.
Those metadata writes belong to JobRepository.
Step checkpoint
@Bean
public Step importOrdersStep(
JobRepository jobRepository,
PlatformTransactionManager transactionManager,
ItemReader<OrderInput> orderReader,
ItemProcessor<OrderInput, ProcessedOrder> orderProcessor,
ItemWriter<ProcessedOrder> orderWriter
) {
return new StepBuilder("importOrdersStep", jobRepository)
.<OrderInput, ProcessedOrder>chunk(3, transactionManager)
.reader(orderReader)
.processor(orderProcessor)
.writer(orderWriter)
.build();
}The reader, processor, and writer do not run by themselves. They become executable batch work when connected inside a Step.
This is current Spring Batch 5 style.
Notice what it does not use:
JobBuilderFactory
StepBuilderFactoryOlder tutorials often use those factory classes. This lab does not.
The key line is:
.<OrderInput, ProcessedOrder>chunk(3, transactionManager)This means:
Read and process up to 3 items, then write those processed items as one chunk inside a transaction boundary.
Chunk size is not the total number of rows in the job.
Chunk size is the grouping size for processing, writing, and committing.
With 6 input rows and chunk size 3, the step should write two chunks:
chunk 1: rows 1, 2, 3
chunk 2: rows 4, 5, 6The JobRepository records step execution metadata, such as read and write counts. The PlatformTransactionManager controls the transaction around chunk work.
Job checkpoint
@Bean
public Job importOrdersJob(
JobRepository jobRepository,
Step importOrdersStep
) {
return new JobBuilder("importOrdersJob", jobRepository)
.start(importOrdersStep)
.build();
}The job is the top-level workflow:
importOrdersJob
`-- importOrdersStep
|-- orderReader
|-- orderProcessor
`-- orderWriterThis lab has only one step, but Job and Step are still different concepts.
The job owns the top-level identity. The step owns one executable phase of work.
8. Build and Run the First Job Instance
Build the jar:
./gradlew clean bootJarExpected result:
BUILD SUCCESSFULIf your project name or version differs, the jar name may differ. With the build.gradle shown above, the jar path is:
build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jarBecause this is the first run after creating or resetting the file-based H2 database, initialize the Spring Batch metadata schema once:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,trueThis command contains two different kinds of arguments:
--spring.batch.jdbc.initialize-schema=alwaysis a Spring Boot application property.
It temporarily overrides the default in application.yml and tells Boot to create the Spring Batch metadata tables.
run.id=1,java.lang.Long,trueis a Spring Batch job parameter.
It identifies this job instance as run.id=1.
The Batch job parameter format is:
key=value,type,identifyingSo this:
run.id=1,java.lang.Long,truemeans:
| Part | Meaning |
|---|---|
run.id | parameter name |
1 | parameter value |
java.lang.Long | parameter type |
true | identifying parameter |
Because run.id is identifying, it participates in JobInstance identity.
Do not write the job parameter as:
--run.id=1That would look like a Spring Boot application property. In this lab, Batch job parameters do not use --.
A successful run should include logs similar to:
Job: [SimpleJob: [name=importOrdersJob]] launched
Executing step: [importOrdersStep]
Step: [importOrdersStep] executed
Job: [SimpleJob: [name=importOrdersJob]] completedThe exact timestamps and thread names will differ.
The application exits after the job completes. That is expected.
This app is not a web server. It starts, runs one batch job, records metadata, writes business output, and exits.
Optional bootRun form
The jar command above is the canonical command for this lab.
If you use Gradle bootRun, quote the argument string so Gradle passes it correctly:
./gradlew bootRun --args='--spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true'PowerShell usually uses double quotes:
./gradlew bootRun --args="--spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true"Inside the quoted string, the same distinction still holds:
--spring.batch.jdbc.initialize-schema=always
= Boot property
run.id=1,java.lang.Long,true
= Batch job parameter9. Inspect the Business Data and Batch Metadata
The application has exited, but the H2 database is file-based, so you can inspect it.
Open the H2 shell:
./gradlew -q h2ShellYou should get an interactive prompt like:
sql>Check the business table
Run:
SELECT COUNT(*) AS processed_order_count
FROM processed_orders;Expected count:
6Then inspect the rows:
SELECT order_id, customer_id, amount, currency, order_tier, processed_at
FROM processed_orders
ORDER BY order_id;Expected business result:
order_id | amount | order_tier |
|---|---|---|
O-1001 | 249.99 | STANDARD |
O-1002 | 1000.00 | HIGH_VALUE |
O-1003 | 1575.50 | HIGH_VALUE |
O-1004 | 35.00 | STANDARD |
O-1005 | 999.99 | STANDARD |
O-1006 | 2500.00 | HIGH_VALUE |
This proves that the reader, processor, and writer pipeline worked:
CSV row -> OrderInput -> ProcessedOrder -> processed_orders rowThese are business rows. They are not Batch metadata rows.
Check the Batch metadata tables
Now inspect the metadata tables.
Run:
SELECT JOB_INSTANCE_ID, JOB_NAME, JOB_KEY
FROM BATCH_JOB_INSTANCE;This table records logical job instances.
Then run:
SELECT JOB_EXECUTION_ID, JOB_INSTANCE_ID, STATUS, START_TIME, END_TIME
FROM BATCH_JOB_EXECUTION
ORDER BY JOB_EXECUTION_ID;This table records actual executions of job instances.
Then run:
SELECT STEP_EXECUTION_ID, STEP_NAME, STATUS, READ_COUNT, WRITE_COUNT
FROM BATCH_STEP_EXECUTION
ORDER BY STEP_EXECUTION_ID;Expected important values:
STEP_NAME = importOrdersStep
STATUS = COMPLETED
READ_COUNT = 6
WRITE_COUNT = 6Finally, inspect job parameters:
SELECT JOB_EXECUTION_ID, PARAMETER_NAME, PARAMETER_TYPE, PARAMETER_VALUE, IDENTIFYING
FROM BATCH_JOB_EXECUTION_PARAMS
ORDER BY JOB_EXECUTION_ID, PARAMETER_NAME;You should see run.id.
The important distinction is:
processed_orders
= business result
BATCH_* tables
= Spring Batch execution metadataREAD_COUNT = 6 does not mean Spring Batch stored your six orders in BATCH_STEP_EXECUTION.
It means the step execution metadata says:
This step read six items.
The actual processed order rows are in processed_orders.
Exit H2 shell:
quit10. Rerun with a Different JobParameter
For later runs, do not initialize the Batch metadata schema again.
Run the application with a different identifying parameter:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,trueThis creates a different JobInstance because the identifying parameter changed.
The identity rule is:
JobInstance = Job name + identifying JobParametersSo these are different:
importOrdersJob + run.id=1
importOrdersJob + run.id=2Open H2 shell again:
./gradlew -q h2ShellCheck business rows:
SELECT COUNT(*) AS processed_order_count
FROM processed_orders;Expected:
6Why not 12?
Because this lab resets only the business table on each application startup through schema.sql.
Now check metadata:
SELECT JOB_INSTANCE_ID, JOB_NAME
FROM BATCH_JOB_INSTANCE
ORDER BY JOB_INSTANCE_ID;You should now see more than one job instance.
Check job parameters:
SELECT JOB_EXECUTION_ID, PARAMETER_NAME, PARAMETER_TYPE, PARAMETER_VALUE, IDENTIFYING
FROM BATCH_JOB_EXECUTION_PARAMS
ORDER BY JOB_EXECUTION_ID, PARAMETER_NAME;You should see both parameter values across executions:
run.id = 1
run.id = 2This is the core rerun lesson:
Changing
run.idchanges the Batch job identity. It does not mean the business table itself manages duplicate rows.
In this lab, duplicate business rows are avoided by resetting processed_orders at startup. That is a lab simplification. In a production design, you would make an explicit business decision about whether writes should insert, update, merge, ignore duplicates, or fail.
What if you rerun run.id=1?
If you run this again:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=1,java.lang.Long,trueSpring Batch may reject the run because the job instance identified by importOrdersJob + run.id=1 has already completed.
That behavior comes from Batch metadata, not from the business table.
Be careful with this lab’s reset design: schema.sql resets the business table at app startup before the job attempts to run. If you intentionally test a rejected rerun, run again afterward with a new run.id before inspecting business rows.
The teaching point is the separation:
| Question | Controlled by |
|---|---|
| ”Has this job instance already completed?” | Spring Batch metadata in BATCH_* tables |
| ”Will business rows be duplicated?” | Business schema and writer strategy |
| ”What does this lab reset automatically?” | processed_orders, not BATCH_* metadata |
| ”When do I recreate Batch metadata tables?” | Only after a full lab reset, using --spring.batch.jdbc.initialize-schema=always |
11. Trace the Execution Model
Now that the lab has run, connect the pieces.
The application starts here:
BatchOrdersApplication.main()Then Spring Boot creates the application context.
Because the app has one configured job named importOrdersJob, Boot launches that job on startup.
The execution flow is:
1. Terminal starts the Spring Boot jar.
2. Spring Boot creates the application context.
3. Spring Batch infrastructure is available.
4. Boot launches importOrdersJob with JobParameters.
5. JobRepository creates or finds the JobInstance.
6. JobRepository records a JobExecution.
7. importOrdersJob starts importOrdersStep.
8. importOrdersStep opens the CSV reader.
9. The reader reads one OrderInput at a time.
10. The processor converts each OrderInput into a ProcessedOrder.
11. The step groups processed items into chunks of 3.
12. JdbcBatchItemWriter inserts each chunk into processed_orders.
13. JobRepository records StepExecution counts and final status.
14. The job completes.
15. The Spring Boot process exits.Separate the responsibilities:
| Component | Responsibility in this lab |
|---|---|
BatchOrdersApplication | Starts Spring Boot |
importOrdersJob | Defines the top-level workflow |
importOrdersStep | Coordinates chunk-oriented work |
orderReader | Reads orders.csv into OrderInput |
orderProcessor | Converts OrderInput into ProcessedOrder |
orderWriter | Inserts ProcessedOrder rows into processed_orders |
JobRepository | Stores metadata in BATCH_* tables |
JobParameters | Identify the logical job instance |
The shortest accurate mental model is:
Job
contains Step
coordinates Reader -> Processor -> Writer
JobRepository
stores execution metadata
Business table
stores domain output12. What This Lab Deliberately Leaves Out
This first lab stops at the core model.
It does not teach:
- skip/retry
- restartability after failure
- listeners
- scheduling
- partitioning
- parallel processing
- production deployment
- multiple jobs
- multiple data sources
Those are important Spring Batch topics, but they are easier to understand after the first structure is clear:
Job
contains Step
coordinates Reader -> Processor -> Writer
JobRepository
stores execution metadata
Business tables
store domain outputFollow-up note: RunIdIncrementer
Many examples use RunIdIncrementer to generate or increment a run.id parameter.
That is useful, but this lab avoids it in the main path because explicit commands make the identity rule easier to see:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,true
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,trueOnce you understand that JobInstance identity depends on identifying JobParameters, RunIdIncrementer becomes a convenience, not magic.
Optional extension: export processed rows to CSV
After the main DB-writing lab, a good extension is:
processed_orders table
|
v
database reader
|
v
optional processor
|
v
FlatFileItemWriter
|
v
build/output/processed-orders.csvDo not add this to the main path yet.
The main lab’s purpose is to understand one complete CSV-to-database batch job. CSV export is a good second lab because it shows that changing the output destination changes the writer, but not the basic Spring Batch structure.
Learning Outcomes Checklist
After completing this lab, you should be able to answer these questions.
What is the difference between a Job and a Step?
A Job is the named top-level batch workflow. A Step is one executable phase inside that workflow.
In this lab:
Job: importOrdersJob
Step: importOrdersStepThe job owns the overall workflow identity. The step owns the chunk-oriented work.
Why does chunk-oriented processing need a reader, processor, and writer?
Because chunk-oriented processing separates three responsibilities:
ItemReader
reads input items
ItemProcessor
transforms input items into output items
ItemWriter
writes output itemsIn this lab:
FlatFileItemReader<OrderInput>
reads CSV rows
ItemProcessor<OrderInput, ProcessedOrder>
classifies orders
JdbcBatchItemWriter<ProcessedOrder>
inserts rows into processed_ordersWhat does JobRepository store?
JobRepository stores Spring Batch execution metadata.
Examples:
- job instance identity
- job execution status
- job parameters
- step execution status
- read count
- write count
- start and end times
It does not store the processed order business rows.
Why are Batch metadata tables different from business data tables?
Because they answer different questions.
processed_orders answers:
What business data did the job produce?
BATCH_* tables answer:
What batch job ran, with what parameters, through which steps, and with what status?
They may live in the same database during the lab, but they have different owners and meanings.
Why can a Batch app start, run, and exit?
Because a Spring Batch app does not have to be a web server.
In this lab:
Spring Boot starts
-> application context is created
-> the Job runs on startup
-> the Step reads, processes, and writes
-> metadata is recorded
-> the Job completes
-> the process exitsExiting after completion is success, not failure.
Why do JobParameters affect reruns?
Because identifying JobParameters participate in JobInstance identity.
Simplified rule:
JobInstance = Job name + identifying JobParametersSo:
importOrdersJob + run.id=1and:
importOrdersJob + run.id=2are different job instances.
Changing run.id changes Batch identity. It does not automatically decide how your business table handles duplicate rows.
Version-Risk Checklist
Use this checklist when comparing this lab with older tutorials.
Java and Spring versions
- This lab targets Java 17+.
- This lab targets Spring Boot 3.5.x.
- This lab targets Spring Batch 5.2.x.
- Do not silently move the article to Spring Boot 4.x or Spring Batch 6.x.
Builder APIs
Avoid old examples that use:
JobBuilderFactory
StepBuilderFactoryThis lab uses current direct builders:
new JobBuilder("importOrdersJob", jobRepository)
new StepBuilder("importOrdersStep", jobRepository)Chunk configuration
Avoid old examples that call:
chunk(100)This lab uses the Spring Batch 5 style:
chunk(3, transactionManager)@EnableBatchProcessing
Do not add @EnableBatchProcessing to this beginner lab.
Spring Boot can auto-configure Batch infrastructure. Adding @EnableBatchProcessing changes the auto-configuration path and can interfere with Boot’s Batch schema initialization behavior.
Job parameters
Batch job parameters should be passed without --:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar run.id=2,java.lang.Long,trueSpring Boot application properties use --:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,trueDo not confuse those two channels.
H2 and metadata initialization
This lab uses file-based H2 so you can inspect tables after the app exits.
The reset behavior is deliberate:
| Reset type | How |
|---|---|
| Business table reset | schema.sql drops and recreates processed_orders on startup |
| Batch metadata initialization | Default is spring.batch.jdbc.initialize-schema: never |
| First run after full reset | Use --spring.batch.jdbc.initialize-schema=always once |
| Later runs | Do not initialize Batch metadata again |
| Full lab reset | Delete data/h2 |
Row mapping
This lab uses Java records for row models, but avoids fragile automatic bean mapping for CSV input by using an explicit fieldSetMapper.
That keeps the CSV-to-Java mapping visible:
order_id -> orderId
customer_id -> customerId
amount -> amount
currency -> currencyCommon Beginner Confusions This Lab Prevents
”Spring Batch metadata is my business data.”
No.
Batch metadata records that a job ran. Business tables store what the job produced.
”JobRepository stores my processed orders.”
No.
JobRepository stores job and step execution metadata. The JdbcBatchItemWriter writes processed orders into processed_orders.
”A Job reads the CSV.”
Not directly.
The job defines the workflow. The reader reads the CSV inside a step.
”A Step is the same thing as a processor.”
No.
The step coordinates the chunk flow. The processor transforms one item at a time.
”Chunk size means total row count.”
No.
Chunk size means how many processed items are grouped before writing and committing.
”If the app exits, something went wrong.”
No.
A non-web Batch app is expected to exit after the job completes.
”Changing run.id prevents duplicate business rows.”
No.
Changing run.id changes Batch job identity. Duplicate business rows are controlled by your business schema and writer strategy.
”schema.sql resets everything.”
No.
In this lab, schema.sql resets only the business table processed_orders.
It does not reset Spring Batch metadata tables.
”All command-line arguments use --.”
No.
Spring Boot application properties use --.
Spring Batch job parameters in this lab do not.
Correct:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --spring.batch.jdbc.initialize-schema=always run.id=1,java.lang.Long,trueIncorrect:
java -jar build/libs/spring-batch-orders-lab-0.0.1-SNAPSHOT.jar --run.id=1Final Mental Model
The whole lab can be compressed into this structure:
Spring Boot application
starts a Spring context
auto-runs one Batch Job
Job: importOrdersJob
top-level workflow identity
Step: importOrdersStep
chunk-oriented execution phase
Reader
orders.csv -> OrderInput
Processor
OrderInput -> ProcessedOrder
Writer
ProcessedOrder -> processed_orders
JobRepository
execution metadata -> BATCH_* tablesIf you can explain that structure without mixing up processed_orders and BATCH_*, you have the first durable Spring Batch mental model.
References
These official references were used to keep the lab aligned with Spring Boot 3.5.x and Spring Batch 5.2.x.