2

I am working on a final year project. I have a file that contain some text. I need to get words form this file that contain "//jj" tag. e.g abc//jj, bcd//jj etc.

suppose file is containing the following text

ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj dsdsd sfsfhf//vv dfdfdf

I need all the words that are associated with //jj tag. I am stuck here past few days. My code that i am trying

  // Create OpenFileDialog
        Microsoft.Win32.OpenFileDialog dlg = new Microsoft.Win32.OpenFileDialog();

        // Set filter for file extension and default file extension
        dlg.DefaultExt = ".txt";
        dlg.Filter = "Text documents (.txt)|*.txt";

        // Display OpenFileDialog by calling ShowDialog method
        Nullable<bool> result = dlg.ShowDialog();

        // Get the selected file name and display in a TextBox
        string filename = string.Empty;
        if (result == true)
        {
            // Open document
            filename = dlg.FileName;
            FileNameTextBox.Text = filename;
        }

        string text;
        using (var streamReader = new StreamReader(filename, Encoding.UTF8))
        {
            text = streamReader.ReadToEnd();
        }

        string FilteredText = string.Empty;

        string pattern = @"(?<before>\w+) //jj (?<after>\w+)";

        MatchCollection matches = Regex.Matches(text, pattern);

        for (int i = 0; i < matches.Count; i++)
        {
            FilteredText="before:" + matches[i].Groups["before"].ToString();
            //Console.WriteLine("after:" + matches[i].Groups["after"].ToString());
        }

        textbx.Text = FilteredText;

I cant find my result please help me.

2
  • 3
    Before I try to help with this, can I just ask you to confirm that the requirements of the project allow you to ask for help from outside sources like SO? Commented Jan 14, 2016 at 16:09
  • What is your code producing as is? Commented Jan 14, 2016 at 16:10

4 Answers 4

7

With LINQ you could do this with one line:

string[] taggedwords = input.Split(' ').Where(x => x.EndsWith(@"//jj")).ToArray();

And all your //jj words will be there...

Sign up to request clarification or add additional context in comments.

6 Comments

Nice! You've reminded me to go and read my "LINQ for Dummies" book again ;-)
I just mean I'm jealous that it didn't occur to me to use LINQ as I've been learning to use it recently :-)
@Equalsk ah, I see.. ;) so that is what you mean. lol, no worries I will still vote your answer up then. :D This way, you won't feel your answer go wasted. :) your answer is actually easier to understand for new programmer than LINQ. I mean: it is still a nice answer. :D
much helpful. Thanks
I have stuck again because text that I have in the file is in other languages like Urdu and Arabic, Not in English, In both of your code failed to get specific words because it can't split
|
3

Personally I think Regex is overkill if that's definitely how the string will look. You haven't specified that you definitely need to use Regex so why not try this instead?

// A list that will hold the words ending with '//jj'
List<string> results = new List<string>();

// The text you provided
string input = @"ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj dsdsd sfsfhf//vv dfdfdf";

// Split the string on the space character to get each word
string[] words = input.Split(' ');

// Loop through each word
foreach (string word in words)
{
    // Does it end with '//jj'?
    if(word.EndsWith(@"//jj"))
    {
        // Yes, add to the list
        results.Add(word);
    }
}

// Show the results
foreach(string result in results)
{
    MessageBox.Show(result);
}

Results are:

ssss//jj
dsdsd//jj

Obviously this is not quite as robust as a regex, but you didn't provide any more detail for me to go on.

6 Comments

In how far is Regex overkill?
Because it's complex to read and understand unless you know what you're doing, which it seems OP does not. The above code is more of a beginners approach to get the same results and is much easier to understand, IMHO. I'm not saying Regex is slower, if that's what you mean.
My solution was similar but using Linq to get the same result results = words.Where(s => s.Trim().EndsWith(@"//jj")).ToArray();
I love a good bit of LINQ, I just wanted to keep it nice and simple because, well, I'm simple!
I have stuck again because text that I have in the file is in other languages like Urdu and Arabic, Not in English, In both of your code failed to get specific words because it can't split.
|
2

You have an extra space in your regex, it assumes there's a space before "//jj". What you want is:

 string pattern = @"(?<before>\w+)//jj (?<after>\w+)";

Comments

0

This regular expression will yield the words you are looking for:

string pattern = "(\\S*)\\/\\/jj"

A bit nicer without backslash escaping:

(\S*)\/\/jj

Matches will include the //jj but you can get the word from the first bracketed group.

1 Comment

If abc abc//ss abc//gg //jj should return 0 matches then (\w+)\/\/jj works also.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.