I'm trying to parse some data from a .DAT file, which consists of some data about transactions. Below is a sample of the said data:
CABCDE123456000000000000000ABCD12345678XY PAYMENTS ABCD ELECTRONIC 12345678 AUTH CANCELLED 2025050800000000000000180000812345678 20250508ABCXXXXXXXXX 202505091234567ABCDEF BBB ABC
I want to parse this data as a CSV and store it in a file every minute. However, I keep running into a roadblock with the following error:
Skipping line 0: missing AUTH CANCELLED → timestamp+XXXXXXXX segment
Skipping line 1: missing AUTH CANCELLED → timestamp+XXXXXXXX segment
Skipping line 2: missing AUTH CANCELLED → timestamp+XXXXXXXX segment
In my command, I have specified a method called parseFileContent
i.e get the data between the text that says AUTH CANCELLED and 2025050800000000000000180000812345678, but it keeps giving me the
error above when the data is present.
Here is a snippet of my code. For context, I'm interested getting the following info: mobile number, amount, transaction id and transaction date.
mobile number in this piece of text 2025050800000000000000180000812345678 and is the
last 10 digits 0812345678 and the transaction date is the first 8
digits 20250508 and lastly amount is 18000 which should be
written as 180.00, and as for the transaction Id in this text 20250508ABCXXXXXXXXX, the relevant info is ABCXXXXXXXXX. Where am I missing it.
Your assistance will be highly appreciated. Thanks!
private function parseFileContent(string $content, string $fileName): Collection
{
$results = collect();
$lines = explode("\n", $content);
foreach ($lines as $lineNumber => $line) {
$line = trim($line);
// Match everything between "AUTH CANCELLED" and the next 14-digit timestamp + XXXXXXXX
if (preg_match('/AUTH\s+CANCELLED\s+(.*?)\s+\d{14}XXXXXXXX/i', $line, $segmentMatch)) {
$segment = trim($segmentMatch[1]);
// Match: 8-digit date, 20-digit amount, 10-digit mobile number (may have varying spaces in between)
if (preg_match('/(\d{8})\s*(\d{20})\s*(\d{10})/', $segment, $matches)) {
$dateRaw = $matches[1];
$amountRaw = $matches[2];
$mobileRaw = $matches[3];
$cleanAmount = ltrim($amountRaw, '0');
$amount = number_format(((int)($cleanAmount ?: '0')) / 100, 2, '.', '');
// Try to extract transaction ID that follows the date
$transactionId = '';
if (preg_match('/' . preg_quote($dateRaw, '/') . '\s*([A-Z0-9]{5,})/', $segment, $tm)) {
$transactionId = trim($tm[1]);
}
$results->push([
'file' => $fileName,
'line' => $lineNumber + 1,
'date' => $this->parseDate($dateRaw),
'amount' => $amount,
'mobile_number' => $mobileRaw,
'transaction_id' => $transactionId,
'raw_line' => $line
]);
} else {
$this->warn("No transaction match in scoped segment on line {$lineNumber}: {$segment}");
}
} else {
$this->warn("Skipping line {$lineNumber}: missing AUTH CANCELLED → timestamp+XXXXXXXX segment");
}
}
return $results;
}
\d{14}XXXXXXXX, but\d{16}XXXXXXXX. Haven't you got more precise information about the syntax of the data format? Is it something standard or related to a custom application log or data format? We need to know all the possible cases you could potentially have in your*.datfiles.*.datfiles. If you can't find any parser for this specific file format, then ok, code it yourself. How big are these files? I presume you read a file completely and put it in memory, passed via the$contentparameter. If the file is 500 MB, you might get into trouble. Seeking line by line with fgets() might be less memory consuming.XXXXXXXXreally in the*.datfile? or is just data impersonalization for the question? If yes, please replace it by12345678or something similar, so that we can understand and help. I ask this because08XXXXXXXXisn't a valid mobile number.XXXXXXXXby12345678and add a few more examples? Actually, you only have a single line of content, so this won't really help us find out the possible variations of your file format. And no, I don't know any parser as we don't have any specification about the format.*.datis kind of any binary content, for a huge quantity of applications. If you know from where they come, you probably could find some specs and give us more precisions.