6

Suppose I have a collection of substrings, for example:

string a = {"cat","sensitive","ate","energy","tense"}

Then the output for this should be as follows:

catensesensitivenergy

How can I do this?

10
  • 2
    In what language? Is this just a theoretical question (e.g. the kind you'd find in schoolwork) for which the language doesn't matter much? Commented Dec 15, 2014 at 13:19
  • 2
    What are the rules for constructing the string? Maximum overlap of substrings? Commented Dec 15, 2014 at 14:06
  • to be very specific language does not matter. Commented Dec 15, 2014 at 14:12
  • we can use native language c. The reason behind asking for algorithm is there should be not or minimum use of function, we can use stdin.h and stdio.h function Commented Dec 15, 2014 at 14:29
  • 2
    Should the output actually be catensensitivenergy? Commented Dec 15, 2014 at 14:37

3 Answers 3

5

This problem is known as the shortest common superstring problem and it is NP-hard, so if you need an exact solution you cannot do much better then trying all possibilities and choosing the best one.

One possible exponential solution is to generate all permutations of the input strings, find the shortest common superstring greedily for each permutation(a permutation specifies the order of strings and it is possible to prove that for a fixed order greedy algorithm always works correctly) and choose the best one.

Sign up to request clarification or add additional context in comments.

3 Comments

@AndyG supersequence and superstring are different things, actually.
"So you cannot do much better than trying all possibilities and choosing the best one." This is an over-simplification of what NP-hard means. Often totally exhaustive solutions can be beaten quite well, albeit not in provable polynomial time, e.g. for factoring integers which may very well be an NP-hard problem yet there are known solutions that are provably so much faster than trial division.
To be more specific, the problem at hand is NP-hard because it reduces to finding the shortest hamiltonian path in a directed graph
3

Using user2040251 suggestion:

#include <string>
#include <iostream>
#include <algorithm>

std::string merge_strings( const std::vector< std::string > & pool )
{
    std::string retval;
    for( auto s : pool )
        if( retval.empty() )
            retval.append( s );
        else if( std::search( retval.begin(), retval.end(), s.begin(), s.end() ) == retval.end() )
        {
            size_t len = std::min( retval.size(), s.size() ); 
            for( ; len; --len )
                if( retval.substr( retval.size() - len ) == s.substr( 0, len ) )
                {
                    retval.append( s.substr( len ) );
                    break;
                }
            if( !len )
                retval.append( s );
        }
    return retval;           
}

std::string shortest_common_supersequence( std::vector< std::string > & pool )
{
    std::sort( pool.begin(), pool.end() );

    std::string buffer;
    std::string best_reduction = merge_strings( pool );
    while( std::next_permutation( pool.begin(), pool.end() ) )
    {
        buffer = merge_strings( pool );
        if( buffer.size() < best_reduction.size() )
            best_reduction = buffer;
    }
    return best_reduction;
}


int main( int argc, char ** argv )
{
    std::vector< std::string > a{"cat","sensitive","ate","energy","tense"};
    std::vector< std::string > b{"cat","sensitive","ate","energy","tense","sit"};
    std::vector< std::string > c{"personal","ate","energy","tense","gyroscope"};
    std::cout << "best a --> \"" << shortest_common_supersequence( a ) << "\"\n";
    std::cout << "best b --> \"" << shortest_common_supersequence( b ) << "\"\n";
    std::cout << "best c --> \"" << shortest_common_supersequence( c ) << "\"\n";

    return 0;
}

Output:

best a --> "catensensitivenergy"
best b --> "catensensitivenergy"
best c --> "atensenergyroscopersonal"

1 Comment

By checking every permutation arent you suboptimal ? Most of the time the underlying graph of the words won't be complete making our life a bit easier :)
1

Break the problem down and see what we got. Starting with only two strings. We must check which suffix of one string is the longest prefix of the other. This gives us the order for the best concatenation.

Now, with a set of n word, how do we do ? We start by building a trie containing every word (a key for each word). If a word is a duplicate of an other we can easily flag it as such while building the prefix tree.

I made a quick implementation of a regular Trie. You can find it here.

We have the tools to build a directed graph linking the different words wether a suffix of the first is a prefix of the second. The weight of the edge is the length of the suffix.

To do so, for each word w of the input set, we must see which words we can reach with a suffix of w :

  • We walk down the trie using the suffix. We will end up in a node (or not).
  • From this node, provided it exists, we scan the remaining subtree to see which words are available.
    • If a given suffix of length l yields a match with a prefix of word w', then we add an edge w → w', with weight length(w') - l.
    • If such an edge already exists, we just update the weight to keep the lowest.

From there, the graph is set and we must find the shortest path that runs through every vertex (eg. word) only once. If the graph is complete, this is the Traveling Salesman Problem. Most of the times, the graph won't be complete.

Still, it remains a NP-hard problem. In more "technical" terms, the problem at hand is to find the shortest hamiltonian path of a digraph.

Note : Given an Hamiltonian path (if it exists) with its cost C, and its starting vertex (word) W, the supestring length is given by :

Lsuper = LW + C

Note : If two words have no suffix linking them to another word, then the graph is not connected and there is no hamiltonian path.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.