cls $filename = "c:\Users\Neal\OneDrive\Documents\myFile.html" #example of what I'm trying to pick out <strong>mydomainname.com</strong> $regexPattern = "<strong>(.*?)</strong></a>" gc $filename | Select-String -Pattern $regexPattern -AllMatches | ForEach-Object {$_.matches.groups[1].value}
Note that the Matches returns two groups with subscripts 0 and 1. The subscript 0 contains the tags “strong” around the match. the subscript 1 contains just the captured text. Thus I put groups[1].value in the logic above. Groups is an object that has several variables; “Value” is the one we need here (see related blogs below).
When can take it to the next level and generate SQL statements to insert those domains into a SQL table.
This is done with one long line of code and using the pipeline (piping).
gc $filename | Select-String -Pattern $regexPattern -AllMatches | ForEach-Object {Write-Host "insert into domains values ('$($_.matches.groups[1].value)')"}
Output is a list or the matching domain names to the console.
References that helped me get this:
https://powershell.org/forums/topic/how-to-get-a-regex-group-from-a-select-string-cmdlet/
See also my related blog on Powershell Regex and the objects that it returns (below).