cls
$filename = "c:\Users\Neal\OneDrive\Documents\myFile.html"

#example of what I'm trying to pick out
<strong>mydomainname.com</strong>
$regexPattern = "<strong>(.*?)</strong></a>"

gc $filename | Select-String -Pattern $regexPattern -AllMatches | ForEach-Object {$_.matches.groups[1].value}

Note that the Matches returns two groups with subscripts 0 and 1. The subscript 0 contains the tags “strong” around the match. the subscript 1 contains just the captured text. Thus I put groups[1].value in the logic above. Groups is an object that has several variables; “Value” is the one we need here (see related blogs below).

When can take it to the next level and generate SQL statements to insert those domains into a SQL table.
This is done with one long line of code and using the pipeline (piping).

gc $filename | Select-String -Pattern $regexPattern -AllMatches  |  ForEach-Object {Write-Host "insert into domains values ('$($_.matches.groups[1].value)')"} 

Output is a list or the matching domain names to the console.

References that helped me get this:

https://powershell.org/forums/topic/how-to-get-a-regex-group-from-a-select-string-cmdlet/

https://stackoverflow.com/questions/25064249/command-line-to-extract-all-domain-names-referenced-in-a-file

See also my related blog on Powershell Regex and the objects that it returns (below).

 

 

Filed under: Uncategorized