4
$\begingroup$

I am an applied math undergraduate student. On my project, I come across an integer linear programming question as follow:

Given $x_0,x_1,...,x_n$: $\forall$ i $\in$ [0,n], $x_i$ = 0 or 1

min Z = $\sum_{i=0}^n x_i $

with m numbers of constraints in the format $\sum x_j \ge 1$: j $\in$ [0,n].

I know this is generally a problem with NP hardness but I was wondering if there is an effective algorithm to solve this problem.

$\endgroup$
8
  • $\begingroup$ This appears to be a form of Set Cover problem. The $n+1$ variables $x_i$ play the role of the sets, and the $m$ constraints are the points to be covered. Choosing the minimum size subset of variables (so that at least one appears in each constraint) amounts to solving your problem. $\endgroup$ Commented Oct 28, 2016 at 23:41
  • $\begingroup$ Yes but is there a P algorithm for this special case? I have around 100k of variables and 400k of constraints and if I use NP algorithm it would nearly be impossible for the code to terminate $\endgroup$ Commented Oct 29, 2016 at 0:29
  • $\begingroup$ I posted my answer, that for constraints of the general type you describe, the minimization of a binary linear program is NP-hard. It remains to consider if the constraints in your application have a special structure that makes the problem tractable. $\endgroup$ Commented Oct 29, 2016 at 19:29
  • $\begingroup$ If it is impractical to describe any special structure that your constraints might have, perhaps you can instead discuss the importance of an exact solution as opposed to an approximate one, for the application you have in mind. $\endgroup$ Commented Oct 30, 2016 at 19:47
  • $\begingroup$ That is a great suggestion. I would keep the question open so that I can continue to receive suggestions from others. I am current studying about minimum flow since I haven't figure out how network simplex could give me an integer solution. $\endgroup$ Commented Oct 31, 2016 at 2:29

2 Answers 2

2
$\begingroup$

The reduction of your problem to an instance of SetCover is perhaps clear to you. We will show that the reverse is also possible, to formulate any SetCover instance as a binary integer program of the kind you are asking about, and this will establish that the resulting minimization computation is NP-hard.

A good survey of approximate methods for such problems (including rounding from linear programming) is given by Fatema Akhter (2015), "A Heuristic Approach for Minimum Set Cover Problem", where a modified hill climbing algorithm is proposed and its performance compared on a library of SetCover problems.

First let's clear up the notation used for the constraints, because what was written in the Question: $$m \text{ numbers of constraints in the form } \sum x_j \ge 1: j \in [0,n]$$

does not adequately distinguish among different constraints. For each constraint there should be a subset $C_k \subseteq \{0,\ldots,n\}$ of indices over which the sum is taken:

$$ \sum_{j \in C_k} x_j \ge 1 $$

For convenience we will refer to this latter constraint by its index subset $C_k$, so that the collection of $m$ constraints is $\mathscr{U} = \{C_1,\ldots,C_m\}$.

This $\mathscr{U}$ will be the universe of our SetCover instance, and the family of sets from which a minimum cover will be found is $\mathscr{S} = \{S_0,S_1,\ldots,S_n\}$ where:

$$ S_i = \{ C_k : x_i \in C_k \} \text{ for } i=0,1,\ldots,n $$

Note that the sets $S_i$ are in one-to-one correspondence with your variables $x_i$. Let a minimum cover $\mathscr{S}' \subseteq \mathscr{S}$ of the universe be found, and set $x_i = 1$ if and only if $S_i \in \mathscr{S}'$, $x_i = 0$ otherwise. A little thought should convince us this assignment minimizes the objective:

$$ Z = \sum_{i=0}^n x_i $$

subject to the constraints $C_k$, which amount requiring each subset $C_k$ contains at least one variable $x_i$ which is assigned a value $1$.

In other words we have described how to choose which variables $x_i=1$ so that their sum $Z$ is minimized.

With a little further thought and explanation, this reduction can be reversed so than any SetCover instance can be expressed as a similar binary integer programming problem. Going in this direction we are given a finite universe $\mathscr{U}$ which we suggestively write as a set of $m$ points:

$$ \mathscr{U} = \{c_1,\ldots,c_m\} $$

and a finite nonempty family of subsets $\mathscr{S} = \{S_0,S_1,\ldots,S_n\}$ whose union is all of $\mathscr{U}$.

Define binary "indicator" variables $x_i \in \{0,1\}$ corresponding to the subsets $S_i$, so that for any choice of subfamily $\mathscr{S}' \subseteq \mathscr{S}$:

$$ x_i = \begin{cases} 1 & \text{if } S_i \in \mathscr{S}' \\ 0 & \text{otherwise} \end{cases} $$

Now as before we ask to minimize $Z = \sum_{i=0}^n x_i$ subject to the $m$ constraints:

$$ \sum_{i \in C_k} x_i \ge 1 $$

for $k = 1,\ldots,m$ where $C_k = \{ i : c_k \in S_i \}$.

A feasible solution $x_0,x_1,\ldots,x_n$ provides us a cover $\mathscr{S}'$ of the universe through the settings of indicator variables $x_i$. That is, since every constraint is satisfied, each $c_k \in \mathscr{U}$ will belong to at least one $S_i \in \mathscr{S}'$. A solution that minimizes $Z$ will at the same time minimize the size of $\mathscr{S}'$, since $Z = |\mathscr{S}'|$.


Having shown that your problem, at least in the generality described above, is NP-hard, it may be of interest to point out algorithms for approximate solutions to SetCover which produce in polynomial-time, covers that are within some factor of the true minimum cover size.

In the context of a universe of size $m$, the greedy algorithm (choose at each step a subset $S_i$ which contains the most points not yet covered) produces a solution whose size is at most $\log m$ times the true minimum size.

The Wikipedia article on SetCover which we linked above gives reference to a fairly recent paper by Dinur and Steurer (2013) which proves that polynomial-time algorithms cannot give much better approximation factors unless P = NP. This is the culmination of various investigations, showing "that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover to lower order terms."

$\endgroup$
8
  • $\begingroup$ There is a contradiction between your answer and mine. You proved that the problem is NP-hard, and I proposed a polynomial time running algorithm for the problem. One of us is wrong, unless P=NP. $\endgroup$ Commented Oct 29, 2016 at 20:52
  • $\begingroup$ @Kuifje: One omission in your Answer is the distinction between continuous and discrete solutions. I will dig up an earlier post that shows how the continuous optimum can be quite far from a discrete (restricted to lattice) optimum. Anyway, something to think about. $\endgroup$ Commented Oct 29, 2016 at 21:00
  • $\begingroup$ Good point, but as a flow problem, the continuous relaxation of the problem should give an integer solution. $\endgroup$ Commented Oct 29, 2016 at 21:01
  • $\begingroup$ I am trying to read through both of these solutions since I don't have a lot of background on this. Can you explain how the flow problem can give you integer solution? @Kuifje $\endgroup$ Commented Oct 30, 2016 at 0:53
  • 1
    $\begingroup$ The consecutive one's case results in a totally unimodular matrix, and the LP optimal BFS is thus integral. When the matrix isn't totally unimodular, you can end up with fractional optimal solutions. I belief that's where Kuifje's argument breaks down. $\endgroup$ Commented Oct 30, 2016 at 17:51
2
$\begingroup$

Actually, this is not a difficult problem, it can be solved efficiently by interpreting it as a network flow problem. Suppose you have the following problem: $$ \min \; x_1+x_2+x_3 $$ subject to $$ x_1+x_2 \ge 1\\ x_1+x_3 \ge 1 \\ x_1,x_2,x_3 \in \{0,1\} $$

Now, create the following graph:enter image description here

Add a unitary cost on arcs $(s,1), (s,2), (s,3)$, and impose that the flow $f$ on all arcs ending in node $t$ is at least equal to $1$. Use any minimum cost flow algorithm to solve the problem and you are done.


EDIT $1$: There is a contradiction between this answer and @hardmath's answer: one of the two is not possible. The problem cannot be NP-hard and be solved in polynomial time. I suspect there is something wrong with my flow model, but I cannot figure out why. Ahuja and Magnanti (Networks Flows) have proposed shortest path algorithms for this problem if there are consecutive 1's in the columns of the matrix, but not for the general case. This is a hint that I believe something must be wrong with my flow model for the general case. Below is their network flow model if there are consecutive 1's in the columns of the matrix. See their book for more details.

enter image description here


EDIT $2$:

As @Erwin Kalvelagen pointed out, the above network flow model is wrong. The reason for that is that the inflow in node $s$ does not equal the objective function (the number of active variables). It counts the total number of times each variable is used, which is not what we want.

In summary:

  1. As proved below by @hardmath, the problem is NP-hard, so in general, no polynomial time algorithm can solve it (unless $P=NP$).
  2. You have two options left:
    • you can use approximation algorithms as mentioned by @hardmath
    • if your set covering problem has a particular type of structure, specifically, if all $1's$ are consecutive in the matrix columns, then the problem is no longer NP-hard and you can use the approach described above in Ahuja and Magnanti's book.
$\endgroup$
13
  • $\begingroup$ The problem is that I have about 100k variables and 400k constraints so I want to find a more effective algorithm than the tree flow which would require exponential time running. $\endgroup$ Commented Oct 29, 2016 at 0:19
  • 1
    $\begingroup$ @Anonymous I used Tikz :) $\endgroup$ Commented Oct 30, 2016 at 17:19
  • 2
    $\begingroup$ I am not sure this network correctly implements the set covering problem. I guess the outflow at t is the number of sets (equations, i.e. 2). What is the inflow at s? A variable inflow indicating the number of active variables? That would be 1. $\endgroup$ Commented Oct 31, 2016 at 3:39
  • 2
    $\begingroup$ So where is this extra flow coming from? One in at s and two out at t. $\endgroup$ Commented Oct 31, 2016 at 13:08
  • 1
    $\begingroup$ You are absolutely right. This is where the problem comes from. This network flow model does not minimize the number of active variables. Thank you Erwin for finding the error here. Your expertise is definitely valuable. $\endgroup$ Commented Oct 31, 2016 at 13:22

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.