Parsing MT940 SWIFT Message using Java REGEX

Discussion:

(too old to reply)

Arun

2008-12-27 19:13:26 UTC

Hi Folks,

I have two SWIFT messages in a file. I have read the entire file into
a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
message blocks 1 ( start with {1: , end with } ) and 2 ( start with
{2: and end with } )

However, I am unable to grab the block 4 ( start with {4: and end with
the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
However, this picks up the message until the last occurence of -}. I
am not sure how to restrict the regex to stop looking beyond the first
occurence of -} . Can you assist please?

Thank you,
Arun

{1:F01AAAABB99BSMK3513951576}
{2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
:20:0112230000000894
:25:GSAKW827958933CAD
:28C:255/1
:60F:C011223CAD32,55
:62F:C011223CAD32,55
-}{5:
{CHK:794BB7656E00}}
{1:F01AAAABB99BSMK3513951576}
{2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
:20:0112230000000890
:25:SAKG800030155USD
:28C:255/1
:60F:C011223USD175768,92
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84
-}{5:
{CHK:0F4E5614DD28}}

John B. Matthews

2008-12-27 19:22:09 UTC

Permalink

In article

Post by Arun
I have two SWIFT messages in a file. I have read the entire file into
a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
message blocks 1 ( start with {1: , end with } ) and 2 ( start with
{2: and end with } )
However, I am unable to grab the block 4 ( start with {4: and end with
the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
However, this picks up the message until the last occurence of -}. I
am not sure how to restrict the regex to stop looking beyond the first
occurence of -} . Can you assist please?

You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
Does a {4: block include line terminators?

<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

Post by Arun
{1:F01AAAABB99BSMK3513951576}
:20:0112230000000894
:25:GSAKW827958933CAD
:28C:255/1
:60F:C011223CAD32,55
:62F:C011223CAD32,55
{CHK:794BB7656E00}}
{1:F01AAAABB99BSMK3513951576}
:20:0112230000000890
:25:SAKG800030155USD
:28C:255/1
:60F:C011223USD175768,92
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84
{CHK:0F4E5614DD28}}

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Arun

2008-12-27 19:42:45 UTC

Permalink

Post by John B. Matthews
In article

You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
Does a {4: block include line terminators?
<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

--
John B. Matthews
trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
- Show quoted text -

John,
Thank you. It worked. I am reading through rethe reluctant quantifiers
now. And yes {4: has a line terminator

WIth your assistance I was able to grab each of the messages
separately.

In the above example, I have a multiline message (:61: followed by
text, followed by a crlf/line terminator and a next line of text
followed by :62F:.

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

Here the line following the line containing :61: is optional like
:61:0112201223CD110,92NDIVNONREF//08 IL053309
:62F:C011021USD175879,84

or the third line could be another starting with :61: like

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,

I wrote something like
((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
[^:]))

It did not work. :(

Where could I be wrong?

Thank you verymuch.
Arun

John B. Matthews

2008-12-27 21:46:54 UTC

Permalink

In article

Post by Arun

Post by John B. Matthews
In article

You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
Does a {4: block include line terminators?
<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

[...]

Post by Arun
Thank you. It worked. I am reading through rethe reluctant quantifiers
now. And yes {4: has a line terminator
With your assistance I was able to grab each of the messages
separately.
In the above example, I have a multiline message (:61: followed by
text, followed by a crlf/line terminator and a next line of text
followed by :62F:.

[...]

Post by Arun
I wrote something like
((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
[^:]))
It did not work. :(
Where could I be wrong?

Sorry, I don't understand SWIFT message syntax well enough to comment.
IIUC, a pre-XML SWIFT parser is non-trivial. You might Google for an
existing solution.

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Arun

2008-12-28 04:11:11 UTC

Permalink

Post by John B. Matthews
In article

Post by Arun

Post by John B. Matthews
In article

You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
Does a {4: block include line terminators?
<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

[...]

Post by Arun
I wrote something like
((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
[^:]))
It did not work. :(
Where could I be wrong?

Sorry, I don't understand SWIFT message syntax well enough to comment.
IIUC, a pre-XML SWIFT parser is non-trivial. You might Google for an
existing solution.
--
John B. Matthews
trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
- Show quoted text -

John,

Simply put, in the lines below,

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

I need to grab line 1&2 in a buffer separately. The rule is start
from :61: and read until i encounter the next :.

Thank you verymuch
Arun

Lew

2008-12-28 04:51:32 UTC

Permalink

Post by Arun
Simply put, in the lines below,

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

Post by Arun
I need to grab line 1&2 in a buffer separately. The rule is start
from :61: and read until i [sic] encounter the next :.

What about line 3?

I think I understand what you were saying, but Usenet wraps lines, so it's
tricky to refer to line numbers that might not match what people are reading.

--
Lew

Arun

2008-12-28 05:00:29 UTC

Permalink

Post by Arun
Simply put, in the lines below,

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

Post by Arun
I need to grab line 1&2 in a buffer separately. The rule is start
from :61: and read until i [sic] encounter the next :.

What about line 3?
I think I understand what you were saying, but Usenet wraps lines, so it's
tricky to refer to line numbers that might not match what people are reading.
--
Lew

Hi Lew,

In my example

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

Here LINE 2 can be any text , basically a (.*) .

LINE 3 could be another line starting with a :

My requirement is if the line starts with :61: , match all characters
until you see a next ":" ( and not :62F: as in above example because
LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

Did I understand your question correcty? And did I give a correct
response? Please let me know.

Thank you
Arun

John B. Matthews

2008-12-28 14:40:05 UTC

Permalink

In article
<1bc8919e-3639-4ad7-aafc-***@b38g2000prf.googlegroups.com>,
Arun <***@gmail.com> wrote:

[...]

Post by Arun
In my example
LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84
Here LINE 2 can be any text , basically a (.*) .
My requirement is if the line starts with :61: , match all characters
until you see a next ":" ( and not :62F: as in above example because
LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

[...]

Do you mean like this:

<sscce>
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Splitting {
public static void main(String[] args) {
String s = ""
+ ":60F:C011223USD175768,92\n"
+ ":61:0112201223CD110,92NDIVNONREF//08 IL053309\n"
+ "/GB/2542049/SHS/312,\n"
+ ":62F:C011021USD175879,84\n";
Pattern p = Pattern.compile(
"(:.*?:.[^:]+)", Pattern.DOTALL);
Matcher m = p.matcher(s);
int i = 1;
while (m.find()) {
System.out.println("(" + i++ + ") " + m.group());
}
}
}
<sscce>

<console>
(1) :60F:C011223USD175768,92

(2) :61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,

(3) :62F:C011021USD175879,84

</console>

See also:

<http://java.sun.com/docs/books/tutorial/essential/regex/>

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Arun

2008-12-28 15:39:06 UTC

Permalink

Post by John B. Matthews
In article
[...]> In my example

Post by Arun
LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84
Here LINE 2 can be any text , basically a (.*) .
My requirement is if the line starts with :61: , match all characters
until you see a next ":" ( and not :62F: as in above example because
LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

[...]
<sscce>
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Splitting {
public static void main(String[] args) {
String s = ""
+ ":60F:C011223USD175768,92\n"
+ ":61:0112201223CD110,92NDIVNONREF//08 IL053309\n"
+ "/GB/2542049/SHS/312,\n"
+ ":62F:C011021USD175879,84\n";
Pattern p = Pattern.compile(
"(:.*?:.[^:]+)", Pattern.DOTALL);
Matcher m = p.matcher(s);
int i = 1;
while (m.find()) {
System.out.println("(" + i++ + ") " + m.group());
}
}}
<sscce>
<console>
(1) :60F:C011223USD175768,92
(2) :61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
(3) :62F:C011021USD175879,84
</console>
<http://java.sun.com/docs/books/tutorial/essential/regex/>
--
John B. Matthews
trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/

John,

Yes. It worked. Thank you so much.Your regex is generic and it worked
for all tags.

Thanks much. I appreciate that.

Arun

Arun

2008-12-29 13:57:23 UTC

Permalink

Post by John B. Matthews
In article
[...]> In my example

John,

In the below example

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

I tried to split line 1 and 2 into logical groups ( for clarity
purpose I had separated each token with braces )

:61:(011220)(1223)(CD)(110,92)(NDIV)(NONREF//08 IL053309)
(/GB/2542049/SHS/312,)

using the following regex pattern
:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

however, I am not able to grab the second line using matcher.group(i)
where i is the group number.

What is wrong in )(.*?[^:]+) ?

Thank you
Arun

:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

John B. Matthews

2008-12-29 15:29:34 UTC

Permalink

In article
<ba2345eb-e7ab-471f-98fa-***@x8g2000yqk.googlegroups.com>,
Arun <***@gmail.com> wrote:

[...]

Post by Arun
In the below example
LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84
I tried to split line 1 and 2 into logical groups ( for clarity
purpose I had separated each token with braces )
:61:(011220)(1223)(CD)(110,92)(NDIV)(NONREF//08 IL053309)
(/GB/2542049/SHS/312,)
using the following regex pattern
:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)
however, I am not able to grab the second line using matcher.group(i)
where i is the group number.
What is wrong in )(.*?[^:]+) ?
[...]
:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

I don't understand. Perhaps you could modify the <http://sscce.org/> I
provided above to clarify the problem. The following tutorial shows how
to catch syntax errors using the methods of PatternSyntaxException:

<http://java.sun.com/docs/books/tutorial/essential/regex/>

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Arun

2008-12-29 16:26:20 UTC

Permalink

Post by John B. Matthews
In article
[...]

I don't understand. Perhaps you could modify the <http://sscce.org/> I
provided above to clarify the problem. The following tutorial shows how
<http://java.sun.com/docs/books/tutorial/essential/regex/>
--
John B. Matthews
trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
- Show quoted text -

I think I did not explain my requirement.

I have 3 lines

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

And I grab line 1 & 2 using pattern "(:61:.*?.[^:]+)" and copy it to a
StringBuffer

Now, with matcher.group(int arg) function, i need to group the
sequence so that i can get the 2nd line.

matcher1.group(1) should return :61:0112201223CD110,92NDIVNONREF//08
IL053309 ( along with the \n ) and matcher1.group(2) should return /GB/
2542049/SHS/312,

This regex is harassing me!!!

Thank you
Arun

John B. Matthews

2008-12-29 17:38:30 UTC

Permalink

In article
[...]

Post by Arun

Post by John B. Matthews
<http://java.sun.com/docs/books/tutorial/essential/regex/>

What syntax errors did this approach discover?

[Please trim sigs.]

Post by Arun
I think I did not explain my requirement.
I have 3 lines
LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84
And I grab line 1 & 2 using pattern "(:61:.*?.[^:]+)" and copy it to
a StringBuffer. Now, with matcher.group(int arg) function, i need to
group the sequence so that i can get the 2nd line.
matcher1.group(1) should return :61:0112201223CD110,92NDIVNONREF//08
IL053309 ( along with the \n ) and matcher1.group(2) should return /GB/
2542049/SHS/312,

[...]

You could try matching the \n:

Pattern p = Pattern.compile("(^.*\n)(.*\n)", Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.matches()) ...

Again, an <http://sscce.org/> would make discussion easier.

[Please trim sigs.]

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

s***@gmail.com

2016-10-18 12:41:31 UTC

Permalink

Post by John B. Matthews
In article
[...]

Post by Arun

Post by John B. Matthews
<http://java.sun.com/docs/books/tutorial/essential/regex/>

What syntax errors did this approach discover?
[Please trim sigs.]

[...]
Pattern p = Pattern.compile("(^.*\n)(.*\n)", Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.matches()) ...
Again, an <http://sscce.org/> would make discussion easier.
[Please trim sigs.]
--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Hi John, i came across one of your post today regarding parsing MT940 file. I am working on a requirement were the file looks as below;
:61:161107D6243,23NXPC2000136822
:86:XPC?00ISSUANCE?20INV:5111107901 DTE:20161107 AMT:742.00?21INV
:5111107903 DTE:20161107 AMT:994.74?22INV:5111107869 DTE:201611
07 AMT:479.00?23INV:5111107872 DTE:20161107 AMT:850.00?24INV:511
1107873 DTE:20161107 AMT:500.44?25INV:5111107875 DTE:20161107 AMT
:634.30?26INV:5111107897 DTE:20161107 AMT:405.10?27INV:51111079
00 DTE:20161107 AMT:1020.25?27INV:5111107867 DTE:20161107 AMT:61
7.40?30CITISUPLFIN?31109087?32LOOS AND CO INC?3324356
I want only the keytags to start with a colon and not other lines. The expected output is i want them in a straight line instead of multiple lines. Please help me.

- Muru

Arun

2008-12-28 05:02:28 UTC

Permalink

Post by Arun
Simply put, in the lines below,

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

Post by Arun
I need to grab line 1&2 in a buffer separately. The rule is start
from :61: and read until i [sic] encounter the next :.

What about line 3?
I think I understand what you were saying, but Usenet wraps lines, so it's
tricky to refer to line numbers that might not match what people are reading.
--
Lew

Lew,

I am enclosing each line between braces ().

(:61:0112201223CD110,92NDIVNONREF//08 IL053309 )
(/GB/2542049/SHS/312,)
(:62F:C011021USD175879,84)

Thank you
Arun

Lew

2008-12-27 19:24:23 UTC

Permalink

Post by Arun
Hi Folks,
I have two SWIFT messages in a file. I have read the entire file into
a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
message blocks 1 ( start with {1: , end with } ) and 2 ( start with
{2: and end with } )
However, I am unable to grab the block 4 ( start with {4: and end with
the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
However, this picks up the message until the last occurence of -}. I
am not sure how to restrict the regex to stop looking beyond the first
occurence of -} . Can you assist please?

Looks like a case for the reluctant quantifier
<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>
<http://java.sun.com/docs/books/tutorial/essential/regex/quant.html>

\\{4:.*?-\\}

if I read the docs correctly.

--
Lew

Roedy Green

2008-12-30 21:23:13 UTC

Permalink

Post by Arun
However, I am unable to grab the block 4 ( start with {4: and end with
the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
However, this picks up the message until the last occurence of -}. I
am not sure how to restrict the regex to stop looking beyond the first
occurence of -} . Can you assist please?

Just a general comment. Regex does not handle delimiter nesting of
variable depth. I did not follow the details of your message, but got
the general impression that might be the problem.

If you have such nesting you need a parser, either one roll yourself
with a finite state automaton, using an enum to track the various
states, and State next( char ) to figure out which state to go to
next depending on the next char.

http://mindprod.com/jgloss/finitestate.html

For tougher parsing you need a parser generator. See
http://mindprod.com/jgloss/parser.html

--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP