0

I have the following raw content in a file. I am trying just print the list of all urls. I have kind of wrote some script. Getting content (reading) from the file and using ForEach line in lines - but do not know how to filter just the Url from the content. Any thoughts ?

Line 18942:         "url": "http://harvardpolitics.com/tag/brussels/",
Line 18994:         "url": "http://203.36.101.164/4f64555b4217b47b7c64b3fec19e389b/1502455203/Telstra/Foxtel-Vod/fxmultismvod5256/store2/ON307529/ON307529_hss.ism/QualityLevels(791000)/Fragments(video=9900000000)"
Line 19044:         "url": "https://www.gucci.com/int/en/ca/women/handbags/womens-shoulder-bags-c-women-handbags-shoulder-bags?filter=%3ANewest%3Acolors%3AGold%7Ccb9822",
Line 19096:         "url": "https://bagalio.cz/batohy-10l?cat=3p%3D1urceni%3D2582p%3D1kapsa_ntb_velikost%3D2179p%3D1manufacturer%3D1302p%3D1color%3D84p=1kapsa_ntb_velikost=2192",
Line 19148:         "url": "http://www.csillagjovo.gportal.hu/gindex.php?pg=31670155",
Line 19200:         "url": "http://www.copiersupplystore.com/hp/color-laserjet-4700dn/j7934a-j7934ar",
4
  • 1
    Where do the line numbers come from - are they in the file or are they your addition to it? It looks like part of a JSON file - If so, use ConvertFrom-Json. Commented Aug 11, 2017 at 23:54
  • Absolutely true, they are the response from an API as JSON blob. I have them filtered in Notepad++ with "url" and a list of around 400 urls showed up. I tried to parse them nothing was working. I will try with ConvertFrom-Json and see if it works. Commented Aug 12, 2017 at 20:14
  • 1
    Invoke-RestMethod will implicitly convert API responses from JSON into PowerShell objects, btw, instead of Invoke-WebRequest Commented Aug 12, 2017 at 22:24
  • Invoke-RestMethod worked and it did come in handy and a better solution than Invoke-WebRequest. Appreciate your help. Commented Aug 13, 2017 at 14:58

3 Answers 3

2

One way could be the substring method another version could be some regex.

$Text = Get-Content D:\Test\test.txt
foreach ($Line in $Text) {
    # SubString Version
    $FirstIndex = $Line.IndexOf('http')
    $URLLength = ($Line.LastIndexOf('"') - $FirstIndex)
    $Line.Substring($FirstIndex, $URLLength)

    # Regex Version 
    $Regex = '(http[s]?|[s]?ftp[s]?)(:\/\/)([^\s,]+)'
    ([regex]::Matches($Line,$Regex)).Value.TrimEnd('"')([^\s,]+)')).Value.TrimEnd('"')
}
Sign up to request clarification or add additional context in comments.

1 Comment

I have tried it but doesn't output anything. I have tried to output to a file - it is empty.
2

Try this out to just get the urls:

$content = Get-Content <file-with-output> # or other way of getting the data

$urls = $content | ForEach-Object { ($_ -replace ".+?(?=http.+)","").Trim('",')}

Edit: Added $urls to catch result.

4 Comments

Just throwing in $_ -replace '^.*(http[^"]+).*$', '$1' as a simpler regex approach (no lookaround, no trim)
My regex is a bit weak, thank you for showing me a better way.
I tried the regex but it only outputs one Urls with line 19200. Is it something with the data copied over to the file. As I have mentioned above "response from an API as JSON blob. I have them filtered in Notepad++ with "url" and a list of around 400 urls showed up. I tried to parse them nothing was working." - I will also try with 'ConvertFrom-Json'.
Thank you all, I have used the Convert-Json and all worked fine. All the above solutions worked well for parsing Urls and output to a file. I appreciate your help in resolving this.
2
$Urls = Get-Content file.txt | ForEach-Object { $_.Split('"')[3] }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.