This is Perl, so There's More Than One Way To Do It (TMTOWTDI). Here's one of them:
#!/usr/bin/env perl
use strict;
use warnings;
my $regex = "heavenn+";
my $rx = qr/$regex/;
print "Regex: $regex\n";
my $file = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";
while ( my $line = <$fh> )
{
if ($line =~ $rx)
{
print $line;
$list{$line}++;
}
}
push @myarr, sort keys %list;
print "Mylist: @myarr\n";
Sample output:
Regex: heavenn+
heavenns
heavenns
heavennly
Mylist: heavennly
heavenns
The sort isn't necessary (but it presents the data in a sane order). You could add items to the array when the count in $list{$line} is 0. You could chomp the input lines to remove the newline. Etc.
What if I want to push only particular words. For example, if my file is, 1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". What to do to print only 'heavenns' and 'heavennly'?
Then you have to arrange to capture the word only. That means refining the regex. Assuming you want heavenn at the start of the word and don't mind what alphabetic characters come after that, then:
#!/usr/bin/env perl
use strict;
use warnings;
my $regex = '\b(heavenn[A-Za-z]*)\b'; # Single quotes necessary!
my $rx = qr/$regex/;
print "Regex: $regex\n";
my $file = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";
while ( my $line = <$fh> )
{
if ($line =~ $rx)
{
print $line;
$list{$1}++;
}
}
push @myarr, sort keys %list;
print "Mylist: @myarr\n";
Data file:
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heaven
heavenly
heavenns
abc
heavenns
heavennly
Output:
Regex: \b(heavenn[A-Za-z]*)\b
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heavenns
heavenns
heavennly
Mylist: heavennly heavenns
Note that the names in the list no longer include newlines.
After a chat
This version takes a regex from the command line. The script invocation is:
perl script.pl -p 'regex' [file ...]
It will read from standard input if no file is specified on the command line (better than having a fixed input file name — by a large margin). It looks for multiple occurrences of the specified regex on each line, where the regex can be preceded by or followed by (or both) 'word characters' as specified by \w.
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;
my %opts;
getopts('p:', \%opts) or die "Usage: $0 [-p 'regex']\n";
my $regex_base = 'heavenn';
#$regex_base = $ARGV[0] if defined $ARGV[0];
$regex_base = $opts{p} if defined $opts{p};
my $regex = '\b(\w*' . ${regex_base} . '\w*)\b';
my $rx = qr/$regex/;
print "Regex: $regex (compiled form: $rx)\n";
my %list;
my @myarr;
while (my $line = <>)
{
while ($line =~ m/$rx/g)
{
print $line;
$list{$1}++;
#$line =~ s///;
}
}
push @myarr, sort keys %list;
print "Matched words: @myarr\n";
Given the input file:
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host. Good heavens! It heaves to like a yacht!
heaven
Is it heavens
heavenly
heavenns
abc
heavenns
heavennly
You can get outputs such as:
$ perl script.pl -p 'e\w*?ly' myfilename.txt
Regex: \b(\w*e\w*?ly\w*)\b (compiled form: (?^:\b(\w*e\w*?ly\w*)\b))
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host. Good heavens! It heaves to like a yacht!
heavenly
heavennly
Matched words: equally heavenly heavennly heavennnly heavennnnly unheavenly
$ perl script.pl myfilename.txt
Regex: \b(\w*heavenn\w*)\b (compiled form: (?^:\b(\w*heavenn\w*)\b))
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
heavenns
heavenns
heavennly
Matched words: heavennly heavennnly heavennnnly heavenns heavennsy
$
"heavenn\+"and"heavenn+"produce the same string.heavennandheavenn+are the same if you don't capture what is matched.cshhas anything to do with the behaviour of your Perl script (unless you're doing something insane like creating the script as a string on the command line; then anything is possible given the metasyntactic zoo of operators in thecsh). Of course, there are those (including me) who'd argue that you shouldn't be usingcshin the first place, but that's your funeral, not mine.