Project:
Spring Framework
Code Location:
git://github.com/SpringSource/spring-batch.gitmaster
file-to-file.apt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
------
Copy File to File
------
Dave Syer
------
January 2007
Use Case: Copy File to File
* Goal
Read a file line-by-line and process into a file in a different
format (possibly different number of lines). Commit periodically
and in the event of an error both data sources (input and output)
rollback to the last known good point.
* Scope
To keep things simple for now, assume that:
* All lines in the file are in the same format and the final
output is an aggregate.
* The files are read and written synchronously by a single
consumer.
* This use case requires two kinds of transactional file source.
One is read-only and the other is write-only. Only one consumer
can use the write-only source at a time.
* Preconditions
* An input file exists in the right format, with a sufficiently
large number of lines to be realistic.
* Success
Integration test confirms that
* All data are processed and output produced successfully.
* Description
Very similar to the use case {{{./chunks.html}Copy File to
Database}}, but involving transactional access to an output source
which is a file. Also we are introducing the idea of an aggregate
function for the output.
The vanilla successful case proceeds as in the file to database
version, except that:
[[1]] A successful chunk results in a line in an intermediate file
output source.
[[1]] After all chunks are successfully processed the intermediate
file is itself processed in a single transaction to complete the
aggregate. The output is itself sent to an output channel
(e.g. database or file).
* Variations
* Chunk failure variations proceed as in the use case
{{{./chunks.html}Copy File to Database}}. In the case of a
restart after fatal failure, the intermediate output file need does
not need to be reset or re-created.
* Implementation
* The write-only file source is new in this use case. It has a
similar flavour to the read-only version, but also has more serious
implications for implementation and usage. Since a file system is
not inherently transactional, when we create the write-only data
source we are assuming that consumers will play by the rules,
principally that there is only one consumer at a time.
* With some external limitations the write-only file source can be
implemented so that within a single JVM it will behave like a
transactional database datasource. We can provide a
<<<FlatFileItemWriter>>> that hides the resource acquisition and
release, and interacts with an existing transaction to provide the
transactional behaviour that is required.
* File-based transactional resources are a lot like messaging
clients. We can send a message (write a line) through a sender
client, and receive a message (read a line) through a consumer
client. In the case of a transaction rollback, all sent messages
are guaranteed not to reach consumers, and all received messages are
returned to the queue. Maybe ActiveMQ has a file transport already?
Mule definitely does, but it isn't transactional.
