0

I need to do a "find and replace" on about 45k lines of a CSV file and then put this into a database.

I figured I should be able to do this with PHP and preg_replace but can't seem to figure out the expression...

The lines consist of one field and are all in the following format:

"./1/024/9780310320241/SPSTANDARD.9780310320241.jpg" or "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg"

The first part will always be a period, the second part will always be one alphanumeric character, the third will always be three alphanumeric characters and the fourth should always be between 1 and 13 alphanumeric characters.

I came up with the following which seems to be right however I will openly profess to not knowing very much at all about regular expressions, it's a little new to me! I'm probably making a whole load of silly mistakes here...

$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);

Anyway any and all help appreciated!

Thanks, Phil

1
  • are the jpg filenames always 13 characters long? Commented Sep 8, 2009 at 10:24

5 Answers 5

1

The only mistake I encouter is the anchor for the string end $ that should be removed. And your expression is also missing the _ character:

/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z_]{1,13}\/)/

A more general pattern would be to just exclude the /:

/^(\.\/[^\/]{1}\/[^\/]{3}\/[^\/]{1,13}\/)/
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, works fine now! Nice to know I was only making one tiny mistake! The second example throws out an error however! Warning: preg_replace() [function.preg-replace]: Unknown modifier ']' The first one works fine though. Thanks again!
1

You should use PHP's builtin parser for extracting the values out of the csv before matching any patterns.

2 Comments

The values do not have quotation marks surrounding them in the file that this is processing. Purely out of educational interest how would I go about performing the same pattern replacement without using regex? I wouldn't know where to begin I'm afraid.
Sorry, I didn't read your question well enough. I guess you must use regular expressions here, but I would extract the values out of the csv first, and apply the RE afterwards.
0

I'm not sure I understand what you're asking. Do you mean every line in the file looks like that, and you want to process all of them? If so, this regex would do the trick:

'#^.*/#' 

That simply matches everything up to and including the last slash, which is what your regex would do if it weren't for that rogue '$' everyone's talking about. If there are other lines in other formats that you want to leave alone, this regex will probably suit your needs:

'#^\./\w/\w{3}/\w{1,13}/#"

Notice how I changed the regex delimiter from '/' to '#' so I don't have to escape the slashes inside. You can use almost any punctuation character for the delimiters (but of course they both have to be the same).

2 Comments

That's much cleaner, the lines should all be in the same format but I don't want to assume that. I used the second version as it's simpler and cleaner, just needed to change to [\w-] to account for hyphens as well. Am I right in assuming that \w is alphanumeric characters and underscores?
Yes, \w is the same as [A-Za-z0-9_]. In some other regex flavors it also matches accented letters plus letters and digits from other writings systems, but PHP's \w is limited to ASCII.
0

The $ means the end of the string. So your pattern would match ./1/024/9780310320241/ and ./t/fla/8204909_flat/ if they were alone on their line. Remove the $ and it will match the first four parts of your string, replacing them with a space.

Comments

0
$pattern = "/(\.\/[0-9a-z]{1}\/[0-9a-z]{3}\/[0-9a-z\_]+\.(jpg|bmp|jpeg|png))\n/is";

I just saw, that your example string doesn't end with /, so may be you should remove it from your pattern at the end. Also underscore is used in the filename and should be in the character class.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.