imported>Etone |
imported>Etone |
Line 1: |
Line 1: |
| == Flow and Cut== | | = Parameter Estimation = |
| | Consider the following abstract problem of parameter estimation. |
|
| |
|
| === Flows ===
| | Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to estimate the ''parameter'' <math>|G|</math>, i.e. the size of <math>G</math>. |
| An instance of the maximum flow problem consists of:
| |
| * a directed graph <math>G(V,E)</math>;
| |
| * two distinguished vertices <math>s</math> (the '''source''') and <math>t</math> (the '''sink'''), where the in-degree of <math>s</math> and the out-degree of <math>t</math> are both 0;
| |
| * the '''capacity function''' <math>c:E\rightarrow\mathbb{R}^+</math> which associates each directed edge <math>(u,v)\in E</math> a nonnegative real number <math>c_{uv}</math> called the '''capacity''' of the edge.
| |
|
| |
|
| The quadruple <math>(G,c,s,t)</math> is called a '''flow network'''.
| | We assume two devices: |
| | * A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling. |
| | * A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>. |
|
| |
|
| A function <math>f:E\rightarrow\mathbb{R}^+</math> is called a '''flow''' (to be specific an '''<math>s</math>-<math>t</math> flow''') in the network <math>G(V,E)</math> if it satisfies:
| | Equipped by <math>\mathcal{U}</math> and <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm: |
| * '''Capacity constraint:''' <math>f_{uv}\le c_{uv}</math> for all <math>(u,v)\in E</math>.
| | *Choose <math>N</math> independent samples from <math>U</math> by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>. |
| * '''Conservation constraint:''' <math>\sum_{u:(u,v)\in E}f_{uv}=\sum_{w:(v,w)\in E}f_{vw}</math> for all <math>v\in V\setminus\{s,t\}</math>. | | * Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>. |
| | * Define the estimator random variable |
| | ::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math> |
|
| |
|
| The '''value''' of the flow <math>f</math> is <math>\sum_{v:(s,v)\in E}f_{sv}</math>.
| | It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an '''<math>\epsilon</math>-approximation''' of <math>|G|</math> if |
| | |
| Given a flow network, the maximum flow problem asks to find the flow of the maximum value.
| |
| | |
| The maximum flow problem can be described as the following linear program.
| |
| :<math>
| |
| \begin{align}
| |
| \text{maximize} \quad& \sum_{v:(s,v)\in E}f_{sv}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align} f_{uv}&\le c_{uv} &\quad& \forall (u,v)\in E\\
| |
| \sum_{u:(u,v)\in E}f_{uv}-\sum_{w:(v,w)\in E}f_{vw} &=0 &\quad& \forall v\in V\setminus\{s,t\}\\
| |
| f_{uv}&\ge 0 &\quad& \forall (u,v)\in E
| |
| \end{align}
| |
| \end{align}
| |
| </math>
| |
| | |
| === Cuts ===
| |
| {{Theorem|Definition|
| |
| :Let <math>(G(V,E),c,s,t)</math> be a flow network. Let <math>S\subset V</math>. We call <math>(S,\bar{S})</math> an '''<math>s</math>-<math>t</math> cut''' if <math>s\in S</math> and <math>t\not\in S</math>.
| |
| :The '''value''' of the cut (also called the '''capacity''' of the cut) is defined as <math>\sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}</math>.
| |
| }}
| |
| | |
| A fundamental fact in flow theory is that cuts always upper bound flows.
| |
| | |
| {{Theorem|Lemma|
| |
| :Let <math>(G(V,E),c,s,t)</math> be a flow network. Let <math>f</math> be an arbitrary flow in <math>G</math>, and let <math>(S,\bar{S})</math> be an arbitrary <math>s</math>-<math>t</math> cut. Then
| |
| ::<math>\sum_{v:(s,v)}f_{sv}\le \sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}</math>,
| |
| :that is, the value of any flow is no greater than the value of any cut.
| |
| }}
| |
| {{Proof|By the definition of <math>s</math>-<math>t</math> cut, <math>s\in S</math> and <math>t\not\in S</math>.
| |
| | |
| Due to the conservation of flow,
| |
| :<math>\sum_{u\in S}\left(\sum_{v:(u,v)\in E}f_{uv}-\sum_{v:(v,u)\in E}f_{vu}\right)=\sum_{v:(s,v)\in E}f_{sv}+\sum_{u\in S\setminus\{s\}}\left(\sum_{v:(u,v)\in E}f_{uv}-\sum_{v:(v,u)\in E}f_{vu}\right)=\sum_{v:(s,v)\in E}f_{sv}\,.</math>
| |
| On the other hand, summing flow over edges,
| |
| :<math>\sum_{v\in S}\left(\sum_{u:(u,v)\in E}f_{uv}-\sum_{u:(v,u)\in E}f_{vu}\right)=\sum_{u\in S,v\in S\atop (u,v)\in E}\left(f_{uv}-f_{uv}\right)+\sum_{u\in S,v\not\in S\atop (u,v)\in E}f_{uv}-\sum_{u\in S,v\not\in S\atop (v,u)\in E}f_{vu}=\sum_{u\in S,v\not\in S\atop (u,v)\in E}f_{uv}-\sum_{u\in S,v\not\in S\atop (v,u)\in E}f_{vu}\,.</math>
| |
| Therefore,
| |
| :<math>\sum_{v:(s,v)\in E}f_{sv}=\sum_{u\in S,v\not\in S\atop (u,v)\in E}f_{uv}-\sum_{u\in S,v\not\in S\atop (v,u)\in E}f_{vu}\le\sum_{u\in S,v\not\in S\atop (u,v)\in E}f_{uv}\le \sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}\,,</math>
| |
| }}
| |
| | |
| === Augmenting paths ===
| |
| {{Theorem|Definition (Augmenting path)|
| |
| :Let <math>f</math> be a flow in <math>G</math>. An '''augmenting path to <math>u_k</math>''' is a sequence of distinct vertices <math>P=(u_0,u_1,\cdots, u_k)</math>, such that
| |
| :* <math>u_0=s\,</math>;
| |
| :and each pair of consecutive vertices <math>u_{i}u_{i+1}\,</math> in <math>P</math> corresponds to either a '''forward edge''' <math>(u_{i},u_{i+1})\in E</math> or a '''reverse edge''' <math>(u_{i+1},u_{i})\in E</math>, and
| |
| :* <math>f(u_i,u_{i+1})<c(u_i,u_{i+1})\,</math> when <math>u_{i}u_{i+1}\,</math> corresponds to a forward edge <math>(u_{i},u_{i+1})\in E</math>, and
| |
| :* <math>f(u_{i+1},u_i)>0\,</math> when <math>u_{i}u_{i+1}\,</math> corresponds to a reverse edge <math>(u_{i+1},u_{i})\in E</math>.
| |
| :If <math>u_k=t\,</math>, we simply call <math>P</math> an '''augmenting path'''.
| |
| }}
| |
| | |
| Let <math>f</math> be a flow in <math>G</math>. Suppose there is an augmenting path <math>P=u_0u_1\cdots u_k</math>, where <math>u_0=s</math> and <math>u_k=t</math>. Let <math>\epsilon>0</math> be a positive constant satisfying
| |
| *<math>\epsilon \le c(u_{i},u_{i+1})-f(u_i,u_{i+1})</math> for all forward edges <math>(u_{i},u_{i+1})\in E</math> in <math>P</math>;
| |
| *<math>\epsilon \le f(u_{i+1},u_i)</math> for all reverse edges <math>(u_{i+1},u_i)\in E</math> in <math>P</math>.
| |
| Due to the definition of augmenting path, we can always find such a positive <math>\epsilon</math>.
| |
| | |
| Increase <math>f(u_i,u_{i+1})</math> by <math>\epsilon</math> for all forward edges <math>(u_{i},u_{i+1})\in E</math> in <math>P</math> and decrease <math>f(u_{i+1},u_i)</math> by <math>\epsilon</math> for all reverse edges <math>(u_{i+1},u_i)\in E</math> in <math>P</math>. Denote the modified flow by <math>f'</math>. It can be verified that <math>f'</math> satisfies the capacity constraint and conservation constraint thus is still a valid flow. On the other hand, the value of the new flow <math>f'</math>
| |
| :<math>\sum_{v:(s,v)\in E}f_{sv}'=\epsilon+\sum_{v:(s,v)\in E}f_{sv}>\sum_{v:(s,v)\in E}f_{sv}</math>.
| |
| Therefore, the value of the flow can be "augmented" by adjusting the flow on the augmenting path. This immediately implies that if a flow is maximum, then there is no augmenting path. Surprisingly, the converse is also true, thus maximum flows are "characterized" by augmenting paths.
| |
| | |
| {{Theorem|Lemma|
| |
| :A flow <math>f</math> is maximum if and only if there are no augmenting paths.
| |
| }}
| |
| {{Proof|We have already proved the "only if" direction above. Now we prove the "if" direction.
| |
| | |
| Let <math>S=\{u\in V\mid \exists\text{an augmenting path to }u\}</math>. Clearly <math>s\in S</math>, and since there is no augmenting path <math>t\not\in S</math>. Therefore, <math>(S,\bar{S})</math> defines an <math>s</math>-<math>t</math> cut.
| |
| | |
| We claim that
| |
| :<math>\sum_{v:(s,v)}f_{sv}= \sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}</math>,
| |
| that is, the value of flow <math>f</math> approach the value of the cut <math>(S,\bar{S})</math> defined above. By the above lemma, this will imply that the current flow <math>f</math> is maximum.
| |
| | |
| To prove this claim, we first observe that
| |
| :<math>\sum_{v:(s,v)}f_{sv}= \sum_{u\in S,v\not\in S\atop (u,v)\in E}f_{uv}-\sum_{u\in S,v\not\in S\atop (v,u)\in E}f_{vu}</math>.
| |
| This identity is implied by the flow conservation constraint, and holds for any <math>s</math>-<math>t</math> cut <math>(S,\bar{S})</math>.
| |
| | |
| We then claim that
| |
| *<math>f_{uv}=c_{uv}</math> for all <math>u\in S,v\not\in S, (u,v)\in E</math>; and
| |
| *<math>f_{vu}=0</math> for all <math>u\in S,v\not\in S, (v,u)\in E</math>.
| |
| If otherwise, then the augmenting path to <math>u</math> apending <math>uv</math> becomes a new augmenting path to <math>v</math>, which contradicts that <math>S</math> includes all vertices to which there exist augmenting paths.
| |
| | |
| Therefore,
| |
| :<math>\sum_{v:(s,v)}f_{sv}= \sum_{u\in S,v\not\in S\atop (u,v)\in E}f_{uv}-\sum_{u\in S,v\not\in S\atop (v,u)\in E}f_{vu} = \sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}</math>.
| |
| As discussed above, this proves the theorem.
| |
| }}
| |
| | |
| == Max-Flow Min-Cut ==
| |
| | |
| === The max-flow min-cut theorem ===
| |
| {{Theorem|Max-Flow Min-Cut Theorem|
| |
| :In a flow network, the maximum value of any <math>s</math>-<math>t</math> flow equals the minimum value of any <math>s</math>-<math>t</math> cut.
| |
| }}
| |
| | |
| {{Proof|
| |
| Let <math>f</math> be a flow with maximum value, so there is no augmenting path.
| |
| | |
| Again, let
| |
| :<math>S=\{u\in V\mid \exists\text{an augmenting path to }u\}</math>.
| |
| As proved above, <math>(S,\bar{S})</math> forms an <math>s</math>-<math>t</math> cut, and
| |
| :<math>\sum_{v:(s,v)}f_{sv}= \sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}</math>,
| |
| that is, the value of flow <math>f</math> equals the value of cut <math>(S,\bar{S})</math>.
| |
| | |
| Since we know that all <math>s</math>-<math>t</math> flows are not greater than any <math>s</math>-<math>t</math> cut, the value of flow <math>f</math> equals the minimum value of any <math>s</math>-<math>t</math> cut.
| |
| }}
| |
| | |
| === Flow Integrality Theorem ===
| |
| {{Theorem|Flow Integrality Theorem|
| |
| :Let <math>(G,c,s,t)</math> be a flow network with integral capacity <math>c</math>. There exists an integral flow which is maximum.
| |
| }}
| |
| {{Proof|
| |
| Let <math>f</math> be an integral flow of maximum value. If there is an augmenting path, since both <math>c</math> and <math>f</math> are integral, a new flow can be constructed of value 1+the value of <math>f</math>, contradicting that <math>f</math> is maximum over all integral flows. Therefore, there is no augmenting path, which means that <math>f</math> is maximum over all flows, integral or not.
| |
| }}
| |
| | |
| === Applications: Menger's theorem ===
| |
| Given an undirected graph <math>G(V,E)</math> and two distinct vertices <math>s,t\in V</math>, a set of edges <math>C\subseteq E</math> is called an '''<math>s</math>-<math>t</math> cut''', if deleting edges in <math>C</math> disconnects <math>s</math> and <math>t</math>.
| |
| | |
| A simple path from <math>s</math> to <math>t</math> is called an '''<math>s</math>-<math>t</math> path'''. Two paths are said to be '''edge-disjoint''' if they do not share any edge.
| |
| | |
| {{Theorem|Theorem (Menger 1927)|
| |
| :Let <math>G(V,E)</math> be an arbitrary undirected graph and <math>s,t\in V</math> be two distinct vertices. The minimum size of any <math>s</math>-<math>t</math> cut equals the maximum number of edge-disjoint <math>s</math>-<math>t</math> paths.
| |
| }}
| |
| {{proof|
| |
| Construct a directed graph <math>G'(V,E')</math> from <math>G(V,E)</math> as follows: replace every undirected edge <math>uv\in E</math> that <math>s,t\not\in\{u,v\}</math> by two directed edges <math>(u,v)</math> and <math>(v,u)</math>; replace every undirected edge <math>sv\in E</math> by <math>(s,v)</math>, and very undirected edge <math>vt\in E</math> by <math>(v,t)</math>. Then assign every directed edge with capacity 1.
| |
| | |
| It is easy to verify that in the flow network constructed as above, the followings hold:
| |
| *An integral <math>s</math>-<math>t</math> flow corresponds to a set of <math>s</math>-<math>t</math> paths in the undirected graph <math>G</math>, where the value of the flow is the number of <math>s</math>-<math>t</math> paths.
| |
| *An <math>s</math>-<math>t</math> cut in the flow network corresponds to an <math>s</math>-<math>t</math> cut in the undirected graph <math>G</math> with the same value.
| |
| The Menger's theorem follows as a direct consequence of the max-flow min-cut theorem.
| |
| }}
| |
| | |
| === Applications: König-Egerváry theorem ===
| |
| Let <math>G(V,E)</math> be a graph. An edge set <math>M\subseteq E</math> is called a '''matching''' if no edge in <math>M</math> shares any vertex. A vertex set <math>C\subseteq V</math> is called a '''vertex cover''' if for any edge <math>uv\in E</math>, either <math>u\in C</math> or <math>v\in C</math>.
| |
| | |
| {{Theorem|Theorem (König 1936)|
| |
| :In any bipartite graph <math>G(V_1,V_2,E)</math>, the size of a ''maximum'' matching equals the size of a ''minimum'' vertex cover.
| |
| }}
| |
| We now show how a reduction of bipartite matchings to flows.
| |
| | |
| Construct a flow network <math>(G'(V,E'),c,s,t)</math> as follows:
| |
| * <math>V=V_1\cup V_2\cup\{s,t\}</math> where <math>s</math> and <math>t</math> are two new vertices.
| |
| * For ever <math>uv\in E</math>, add <math>(u,v)</math> to <math>E'</math>; for every <math>u\in V_1</math>, add <math>(s,u)</math> to <math>E'</math>; and for every <math>v\in V_2</math>, add <math>(v,t)</math> to <math>E'</math>.
| |
| * Let <math>c_{su}=1</math> for every <math>u\in V_1</math> and <math>c_{vt}=1</math> for every <math>v\in V_2</math>. Let <math>c_{uv}=\infty</math> for every bipartite edges <math>(u,v)</math>.
| |
| | |
| {{Theorem|Lemma|
| |
| :The size of a maximum matching in <math>G</math> is equal to the value of a maximum <math>s</math>-<math>t</math> flow in <math>G'</math>.
| |
| }}
| |
| {{proof|
| |
| Given an integral <math>s</math>-<math>t</math> flow <math>f</math> in <math>G'</math>, let <math>M=\{uv\in E\mid f_{uv}=1\}</math>. Then <math>M</math> must be a matching since for every <math>u\in V_1</math>. To see this, observe that there is at most one <math>v\in V_2</math> that <math>f_{uv}=1</math>, because of that <math>f_{su}\le c_{su}=1</math> and conservation of flows. The same holds for vertices in <math>V_2</math> by the same argument. Therefore, each flow corresponds to a matching.
| |
| | |
| Given a matching <math>M</math> in bipartite graph <math>G</math>, define an integral flow <math>f</math> as such: for <math>uv\in E</math>, <math>f_{uv}=1</math> if <math>uv\in M</math> and <math>f_{uv}=0</math> if otherwise; for <math>u\in V_1</math>, <math>f_{su}=1</math> if <math>uv\in M</math> for some <math>v</math> and <math>f_{su}=0</math> if otherwise; for <math>v\in V_2</math>, <math>f_{vt}=1</math> if <math>uv\in M</math> for some <math>u</math> and <math>f_{vt}=0</math> if otherwise.
| |
| | |
| It is easy to check that <math>f</math> is valid <math>s</math>-<math>t</math> flow in <math>G'</math>. Therefore, there is an one-one correspondence between flows in <math>G'</math> and matchings in <math>G</math>. The lemma follows naturally.
| |
| }}
| |
| | |
| We then establish a correspondence between <math>s</math>-<math>t</math> cuts in <math>G'</math> and vertex covers in <math>G</math>.
| |
| | |
| Suppose <math>(S,\bar{S})</math> is an <math>s</math>-<math>t</math> cut in <math>G'</math>.
| |
| {{Theorem|Lemma|
| |
| :The size of a minimum vertex cover in <math>G</math> is equal to the value of a minimum <math>s</math>-<math>t</math> cut in <math>G'</math>.
| |
| }}
| |
| {{proof|
| |
| Let <math>(S,\bar{S})</math> be an <math>s</math>-<math>t</math> cut of minimum capacity in <math>G'</math>. Then <math>\sum_{u\in S, v\not\in S\atop (u,v)\in E'}c_{uv}</math> must be finite since <math>S=\{s\}</math> gives us an <math>s</math>-<math>t</math> cut whose capacity is <math>|V_1|</math> which is finite. Therefore, no edge <math>uv\in E</math> has <math>u\in V_1\cap S</math> and <math>v\in V_2\setminus S</math>, i.e., for all <math>uv\in E</math>, either <math>u\in V_1\setminus S</math> or <math>v\in V_2\cap S</math>. Therefore, <math>(V_1\setminus S)\cup(V_2\cap S)</math> is a vertex cover in <math>G</math>, whose size is
| |
| :<math>|(V_1\setminus S)\cup(V_2\cap S)|=|V_1\setminus S|+|V_2\cap S|=\sum_{u\in V_1\setminus S}c_{su}+\sum_{v\in V_2\cap S}c_{ut}=\sum_{u\in S,v\not\in S\atop (u,v)\in E'}c_{uv}</math>.
| |
| The last term is the capacity of the minimum <math>s</math>-<math>t</math> cut <math>(S,\bar{S})</math>.
| |
| }}
| |
| | |
| The König-Egerváry theorem then holds as a consequence of the max-flow min-cut theorem.
| |
| | |
| == Linear Programming ==
| |
| | |
| Consider the following LP:
| |
| :<math>
| |
| \begin{align}
| |
| \text{minimize} && 7x_1+x_2+5x_3\\
| |
| \text{subject to} &&
| |
| x_1-x_2+3x_3 &\ge 10\\
| |
| &&
| |
| 5x_1-2x_2-x_3 &\ge 6\\
| |
| && x_1,x_2,x_3 &\ge 0
| |
| \end{align}
| |
| </math>
| |
| | |
| Let <math>OPT</math> be the value of the optimal solution. We want to estimate the upper and lower bound of <math>OPT</math>.
| |
| | |
| Since <math>OPT</math> is the minimum over the feasible set, every feasible solution forms an upper bound for <math>OPT</math>. For example <math>\boldsymbol{x}=(2,1,3)</math> is a feasible solution, thus <math>OPT\le 7\cdot 2+1+5\cdot 3=30</math>.
| |
| | |
| For the lower bound, the optimal solution must satisfy the two constraints:
| |
| :<math> | | :<math> |
| \begin{align}
| | (1-\epsilon)|G|\le Z\le (1+\epsilon)|G|. |
| x_1-x_2+3x_3 &\ge 10,\\
| |
| 5x_1-2x_2-x_3 &\ge 6.\\
| |
| \end{align}
| |
| </math>
| |
| Since the <math>x_i</math>'s are restricted to be nonnegative, term-by-term comparison of coefficients shows that
| |
| :<math>7x_1+x_2+5x_3\ge(x_1-x_2+3x_3)+(5x_1-2x_2-x_3)\ge 16.</math>
| |
| The idea behind this lower bound process is that we are finding suitable nonnegative multipliers (in the above case the multipliers are all 1s) for the constraints so that when we take their sum, the coefficient of each <math>x_i</math> in the sum is dominated by the coefficient in the objective function. It is important to ensure that the multipliers are nonnegative, so they do not reverse the direction of the constraint inequality.
| |
| | |
| To find the best lower bound, we need to choose the multipliers in such a way that the sum is as large as possible. Interestingly, the problem of finding the best lower bound can be formulated as another LP:
| |
| | |
| ::<math>
| |
| \begin{align}
| |
| \text{maximize} && 10y_1+6y_2\\
| |
| \text{subject to} &&
| |
| y_1+5y_2 &\le 7\\
| |
| &&
| |
| -y_1+2y_2 &\le 1\\
| |
| &&3y_1-y_2 &\le 5\\
| |
| && y_1,y_2&\ge 0
| |
| \end{align}
| |
| </math> | | </math> |
|
| |
|
| Here <math>y_1</math> and <math>y_2</math> were chosen to be nonnegative multipliers for the first and the second constraint, respectively. We call the first LP the '''primal program''' and the second LP the '''dual program'''. By definition, every feasible solution to the dual program gives a lower bound for the primal program.
| | The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math> |
|
| |
|
| === Duality ===
| | {{Theorem |
| Given an LP in canonical form, called the '''primal''' LP:
| | |Theorem (estimator theorem)| |
| :<math>
| | :Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided |
| \begin{align}
| | ::<math>N\ge\frac{4}{\epsilon^2 \alpha}\ln\frac{2}{\delta}</math>. |
| \text{minimize} && \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{subject to} &&
| |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math>
| |
| | |
| the '''dual''' LP is defined as follows:
| |
| :<math>
| |
| \begin{align}
| |
| \text{maximum} && \boldsymbol{b}^T\boldsymbol{y}\\
| |
| \text{subject to} &&
| |
| A^T\boldsymbol{y} &\le\boldsymbol{c}\\
| |
| && \boldsymbol{y} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math>
| |
| | |
| ==== LP for maximum flow ====
| |
| In the last lecture, we defined the maximum flow problem, whose LP is
| |
| :<math>
| |
| \begin{align}
| |
| \text{maximize} \quad& \sum_{v:(s,v)\in E}f_{sv}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align} f_{uv}&\le c_{uv} &\quad& \forall (u,v)\in E\\
| |
| \sum_{u:(u,v)\in E}f_{uv}-\sum_{w:(v,w)\in E}f_{vw} &=0 &\quad& \forall v\in V\setminus\{s,t\}\\
| |
| f_{uv}&\ge 0 &\quad& \forall (u,v)\in E
| |
| \end{align}
| |
| \end{align}
| |
| </math> | |
| where directed graph <math>G(V,E)</math> is the flow network, <math>s\in V</math> is the source, <math>t\in V</math> is the sink, and <math>c_{uv}</math> is the capacity of directed edge <math>(u,v)\in E</math>.
| |
| | |
| We add a new edge from <math>t</math> to <math>s</math> to <math>E</math>, and let the capacity be <math>c_{ts}=\infty</math>. Let <math>E'</math> be the new edge set. The LP for the max-flow problem can be rewritten as:
| |
| :<math>
| |
| \begin{align}
| |
| \text{maximize} \quad& f_{ts}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align} f_{uv}&\le c_{uv} &\quad& \forall (u,v)\in E\\
| |
| \sum_{u:(u,v)\in E'}f_{uv}-\sum_{w:(v,w)\in E'}f_{vw} &\le0 &\quad& \forall v\in V\\
| |
| f_{uv}&\ge 0 &\quad& \forall (u,v)\in E'
| |
| \end{align}
| |
| \end{align}
| |
| </math> | |
| The second set of inequalities seem weaker than the original conservation constraint of flows, however, if this inequality holds at every node, then in fact it must be satisfied with equality at every node, thereby implying the flow conservation.
| |
| | |
| To obtain the dual program we introduce variables <math>d_{uv}</math> and <math>p_v</math> corresponding to the two types of inequalities in the primal. The dual LP is:
| |
| :<math>
| |
| \begin{align}
| |
| \text{minimize} \quad& \sum_{(u,v)\in E}c_{uv}d_{uv}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align} d_{uv}-p_u+p_v &\ge 0 &\quad& \forall (u,v)\in E\\
| |
| p_s-p_t &\ge1 \\
| |
| d_{uv} &\ge 0 &\quad& \forall (u,v)\in E\\
| |
| p_v&\ge 0 &\quad& \forall v\in V
| |
| \end{align}
| |
| \end{align}
| |
| </math> | |
| It is more helpful to consider its integer version:
| |
| :<math>
| |
| \begin{align}
| |
| \text{minimize} \quad& \sum_{(u,v)\in E}c_{uv}d_{uv}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align} d_{uv}-p_u+p_v &\ge 0 &\quad& \forall (u,v)\in E\\
| |
| p_s-p_t &\ge1 \\
| |
| d_{uv} &\in\{0,1\} &\quad& \forall (u,v)\in E\\
| |
| p_v&\in\{0,1\} &\quad& \forall v\in V
| |
| \end{align}
| |
| \end{align}
| |
| </math>
| |
| Suppose that for the flow LP, the integral optimal solution always equals the fractional optimal solution (we will show this later).
| |
| The variables <math>p_v</math> defines a bipartition of vertex set <math>V</math>. Let <math>S=\{v\in V\mid p_v=1\}</math>. The complement <math>\bar{S}=\{v\in V\mid p_v=1\}</math>.
| |
| | |
| For 0/1-valued variables, the only way to satisfy <math>p_s-p_t\ge1</math> is to have <math>p_s=1</math> and <math>p_t=0</math>. Therefore, <math>(S,\bar{S})</math> is an <math>s</math>-<math>t</math> cut.
| |
| | |
| In an optimal solution, <math>d_{uv}=1</math> if and only if <math>u\in S,v\in\bar{S}</math> and <math>(u,v)\in E</math>. Therefore, the objective function of an optimal solution <math>\sum_{u\in S,v\not\in S\atop (u,v)\in E}c_{uv}</math> is the capacity of the minimum <math>s</math>-<math>t</math> cut <math>(S,\bar{S})</math>.
| |
| | |
| ====LP duality theorem ====
| |
| Consider the primal LP:
| |
| :<math>
| |
| \begin{align}
| |
| \text{minimize} && \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{subject to} &&
| |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math>
| |
| | |
| Its dual LP is:
| |
| :<math>
| |
| \begin{align}
| |
| \text{maximum} && \boldsymbol{b}^T\boldsymbol{y}\\
| |
| \text{subject to} && | |
| A^T\boldsymbol{y} &\le\boldsymbol{c}\\
| |
| && \boldsymbol{y} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math> | |
| | |
| {{Theorem|Theorem|
| |
| : The dual of a dual is the primal.
| |
| }}
| |
| {{Proof|
| |
| The dual program can be written as the following minimization in canonical form:
| |
| :<math> | |
| \begin{align}
| |
| \min && -\boldsymbol{b}^T\boldsymbol{y}\\
| |
| \text{s.t.} &&
| |
| -A^T\boldsymbol{y} &\ge-\boldsymbol{c}\\
| |
| && \boldsymbol{y} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math>
| |
| Its dual is:
| |
| :<math>
| |
| \begin{align}
| |
| \max && -\boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{s.t.} &&
| |
| -A\boldsymbol{x} &\le-\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math> | |
| which is equivalent to the primal:
| |
| :<math>
| |
| \begin{align}
| |
| \min && \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{s.t.} &&
| |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math>
| |
| }} | | }} |
| | {{Proof| |
| | Recall that <math>Y_i</math> indicates whether the <math>i</math>-th sample is in <math>G</math>. Let <math>Y=\sum_{i=1}^NY_i</math>. Then we have <math>Z=\frac{|U|}{N}Y</math>, and hence the event <math>(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|</math> is equivalent to that <math>(1-\epsilon)\frac{|G|}{|U|}N\le Y\le (1+\epsilon)\frac{|G|}{|U|}N</math>. Note that each <math>Y_i</math> is a Bernoulli trial that succeeds with probability <math>\frac{|G|}{|U|}</math>, thus <math>\mathbb{E}[Y]=\frac{|G|}{|U|}N</math>. Then the rest is due to Chernoff bound.}} |
|
| |
|
| We have shown that feasible solutions of a dual program can be used to lower bound the optimum of the primal program. This is formalized by the following important theorem.
| | A counting algorithm for the set <math>G</math> has to deal with the following three issues: |
| | # Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model. |
| | # Implement the uniform sampler <math>\mathcal{U}</math>. This can be straightforward or highly nontrivial, depending on the problem. |
| | # Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas. |
|
| |
|
| {{Theorem|Theorem (Weak duality theorem)|
| | = Counting DNFs = |
| :If there exists an optimal solution to the primal LP:
| | A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example: |
| ::<math>
| | :<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>. |
| \begin{align} | | Note the difference from the conjunctive normal forms (CNF). |
| \min && \boldsymbol{c}^T\boldsymbol{x}\\ | |
| \text{s.t.} && | |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math> | |
| :then,
| |
| ::<math>
| |
| \begin{align}
| |
| \begin{align}
| |
| \min && \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{s.t.} &&
| |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| &\begin{align}
| |
| \ge\\
| |
| \\
| |
| \\
| |
| \end{align}&\quad
| |
| \begin{align}
| |
| \max && \boldsymbol{b}^T\boldsymbol{y}\\
| |
| \text{s.t.} &&
| |
| A^T\boldsymbol{y} &\le\boldsymbol{c}\\
| |
| && \boldsymbol{y} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| \end{align}
| |
| </math>
| |
| }}
| |
| {{proof|
| |
| Let <math>\boldsymbol{x}</math> be an arbitrary feasible solution to the primal LP, and <math>\boldsymbol{y}</math> be an arbitrary feasible solution to the dual LP.
| |
|
| |
|
| We estimate <math>\boldsymbol{y}^TA\boldsymbol{x}</math> in two ways. Recall that <math>A\boldsymbol{x} \ge\boldsymbol{b}</math> and <math>A^T\boldsymbol{y} \le\boldsymbol{c}</math>, thus
| | Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is [http://en.wikipedia.org/wiki/Sharp-P-complete '''#P-complete''']. |
| :<math>\boldsymbol{y}^T\boldsymbol{b}\le\boldsymbol{y}^TA\boldsymbol{x}\le\boldsymbol{c}^T\boldsymbol{x}</math>.
| |
|
| |
|
| Since this holds for any feasible solutions, it must also hold for the optimal solutions.
| | Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula. |
| }} | |
|
| |
|
| A harmonically beautiful result is that the optimums of the primal LP and its dual are equal. This is called the strong duality theorem of linear programming.
| |
|
| |
|
| {{Theorem|Theorem (Strong duality theorem)|
| | ;The union of sets problem |
| :If there exists an optimal solution to the primal LP:
| | We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem. |
| ::<math>
| |
| \begin{align}
| |
| \min && \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{s.t.} &&
| |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| </math>
| |
| :then,
| |
| ::<math>
| |
| \begin{align}
| |
| \begin{align}
| |
| \min && \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \text{s.t.} &&
| |
| A\boldsymbol{x} &\ge\boldsymbol{b}\\
| |
| && \boldsymbol{x} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| &\begin{align}
| |
| =\\
| |
| \\
| |
| \\
| |
| \end{align}&\quad
| |
| \begin{align}
| |
| \max && \boldsymbol{b}^T\boldsymbol{y}\\
| |
| \text{s.t.} &&
| |
| A^T\boldsymbol{y} &\le\boldsymbol{c}\\
| |
| && \boldsymbol{y} &\ge \boldsymbol{0}
| |
| \end{align}
| |
| \end{align}
| |
| </math>
| |
| }}
| |
|
| |
|
| === Integrality===
| | Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold: |
| Consider the '''maximum integral flow''' problem: given as input a flow network <math>(G(V,E),c,s,t)</math> where for every <math>uv\in E</math> the capacity <math>c_{uv}</math> is integer. We want to find the integral flow <math>f:E\rightarrow\mathbb{Z}</math> with maximum value.
| | *For all <math>i</math>, <math>|H_i|</math> is computable in poly-time. |
| | *It is possible to sample uniformly from each individual <math>H_i</math>. |
| | *For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>. |
|
| |
|
| The mathematical programming for the problem is: | | The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>. |
| :<math>
| |
| \begin{align}
| |
| \text{maximize} \quad& \sum_{v:(s,v)\in E}f_{sv}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align} f_{uv}&\le c_{uv} &\quad& \forall (u,v)\in E\\
| |
| \sum_{u:(u,v)\in E}f_{uv}-\sum_{w:(v,w)\in E}f_{vw} &=0 &\quad& \forall v\in V\setminus\{s,t\}\\
| |
| f_{uv}&\in\mathbb{N} &\quad& \forall (u,v)\in E
| |
| \end{align}
| |
| \end{align}
| |
| </math>
| |
| where <math>\mathbb{N}</math> is the set of all nonnegative integers. Compared to the LP for the max-flow problem, we just replace the last line <math> f_{uv}\ge 0</math> with <math> f_{uv}\in\mathbb{N}</math>. The resulting optimization is called an '''integer programming (IP)''', or more specific '''integer linear programming (ILP)'''.
| |
|
| |
|
| Due to the Flow Integrality Theorem, when capacities are integers, there must be an integral flow whose value is maximum among all flows (integral or not). This means the above IP can be efficiently solved by solving its LP-relaxation. This is usually impossible for general IPs.
| | DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once. |
| | * <math>V</math> is the set of all assignments. |
| | *Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>. |
| | * Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable. |
| | * Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments. |
| | * For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>. |
|
| |
|
| Generally, an IP of canonical form is written as
| | ==The coverage algorithm== |
| :<math>
| | We now introduce the coverage algorithm for the union of sets problem. |
| \begin{align}
| |
| \text{maximize} \quad& \boldsymbol{c}^T\boldsymbol{x}\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align}
| |
| A\boldsymbol{x} &\ge\boldsymbol{b} \\
| |
| \boldsymbol{x}&\ge \boldsymbol{0}\\
| |
| \boldsymbol{x}&\in\mathbb{Z}^n
| |
| \end{align}
| |
| \end{align}
| |
| </math>
| |
|
| |
|
| Consider the '''3SAT''' problem. Each instance is a '''3CNF(conjunctive normal form)''': <math>\bigwedge_{i=1}^m(\ell_{i_1}\vee\ell_{i_2}\vee\ell_{i_3})</math>, where each <math>(\ell_{i_1}\vee\ell_{i_2}\vee\ell_{i_3})</math> is a '''clause''' and each <math>\ell_{i_r}\in\{x_{j},\neg x_{j}\mid 1\le j\le n\}</math>, called a '''literal''', is either a boolean variable or a negation of a boolean variable. We want to determine whether there exists an truth assignment of the <math>n</math> boolean variables <math>x_1,\ldots,x_n</math> such that the input formula is satisfied (i.e., is true). | | Consider the multiset <math>U</math> defined by |
| | :<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>, |
| | where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set |
| | :<math>U=\{(x,i)\mid x\in H_i\}</math>. |
| | For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>. |
|
| |
|
| The following IP solves 3SAT:
| | Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>. |
| :<math>
| |
| \begin{align} | |
| \text{maximize} \quad& \sum_{i=1}^mz_i\\
| |
| \begin{align}
| |
| \text{subject to} \\
| |
| \\
| |
| \\
| |
| \\
| |
| \end{align}
| |
| \quad &
| |
| \begin{align}
| |
| z_i &\le y_{i_1}+y_{i_2}+y_{i_3} &\quad& \forall 1\le i\le m\\
| |
| y_{i_r}&\le x_{j} &\quad& \text{if }\ell_{i_r}=x_{j} \\
| |
| y_{i_r}&\le 1-x_{j} &\quad& \text{if }\ell_{i_r}=\neg x_{j} \\
| |
| z_i, x_j&\in\{0,1\} &\quad& \forall 1\le i\le m, 1\le j\le n
| |
| \end{align}
| |
| \end{align}
| |
| </math> | |
| Since 3SAT is NP-hard (actually it is the first problem known to be NP-hard), generally IP is NP-hard.
| |
|
| |
|
| ==== Integral polytopes ====
| | It is obvious that <math>G\subseteq U</math> and |
| A point in an <math>n</math>-dimensional space is integral if it belongs to <math>\mathbb{Z}^n</math>, i.e., if all its coordinates are integers.
| | :<math>|G|=|H|</math>. |
|
| |
|
| A polyhedron is said to be '''integral''' if all its vertices are integral.
| | Therefore, estimation of <math>|H|</math> is reduced to estimation of <math>|G|</math> with <math>G\subseteq U</math>. Then <math>|G|</math> can have an <math>\epsilon</math>-approximation with probability <math>(1-\delta)</math> in poly-time, if we can uniformly sample from <math>U</math> and <math>|G|/|U|</math> is suitably small. |
|
| |
|
| An easy observation is that an integer programming has the same optimal solutions as its LP-relaxation when the polyhedron defined by the LP-relaxation is integral. | | An uniform sample from <math>U</math> can be implemented as follows: |
| | * generate an <math>i\in\{1,2,\ldots,m\}</math> with probability <math>\frac{|H_i|}{\sum_{i=1}^m|H_i|}</math>; |
| | * uniformly sample an <math>x\in H_i</math>, and return <math>(x,i)</math>. |
|
| |
|
| {{Theorem|Theorem (Hoffman 1974)|
| | It is easy to see that this gives a uniform member of <math>U</math>. The above sampling procedure is poly-time because each <math>|H_i|</math> can be computed in poly-time, and sampling uniformly from each <math>H_i</math> is poly-time. |
| : If a polyhedron <math>P</math> is integral then for all integer vectors <math>\boldsymbol{c}</math> there is an optimal solution to <math>\max\{\boldsymbol{c}^T\boldsymbol{x}\mid \boldsymbol{x}\in P\}</math> which is integral.
| |
| }}
| |
| {{Proof|
| |
| There always exists an optimal solution which is a vertex in <math>P</math>. For integral <math>P</math>, all vertices are integral.
| |
| }}
| |
|
| |
|
| ==== Unimodularity ====
| | We now only need to lower bound the ratio |
| {{Theorem|Definition (Unimodularity)|
| | :<math>\alpha=\frac{|G|}{|U|}</math>. |
| :An <math>n\times n</math> integer matrix <math>A</math> is called '''unimodular''' if <math>\det(A)=\pm1</math>. | |
| :An <math>m\times n</math> integer matrix <math>A</math> is called '''total unimodular''' if every square submatrix <math>B</math> of <math>A</math> has <math>\det(B)\in\{1,-1,0\}</math>, that is, every square, nonsingular submatrix of <math>A</math> is unimodular.
| |
| }}
| |
|
| |
|
| A totally unimodular matrix defines a integral convex polyhedron.
| | We claim that |
| {{Theorem|Theorem|
| | :<math>\alpha\ge\frac{1}{m}</math>. |
| :Let <math>A</math> be an <math>m\times n</math> integer matrix. | | It is easy to see this, because each <math>x\in H</math> has at most <math>m</math> instances of <math>(x,i)</math> in <math>U</math>, and we already know that <math>|G|=|H|</math>. |
| :If <math>A</math> is totally unimodualr, then for any integer vector <math>\boldsymbol{b}\in\mathbb{Z}^n</math> the polyhedron <math>\{\boldsymbol{x}\in\mathbb{R}^n\mid A\boldsymbol{x}=\boldsymbol{b}, \boldsymbol{x}\ge \boldsymbol{0}\}</math> is integral.
| |
| }}
| |
| {{Proof|
| |
| Let <math>B</math> be a basis of <math>A</math>, and let <math>\boldsymbol{b}'</math> be the corresponding coordinates in <math>\boldsymbol{b}</math>. A basic solution is formed by <math>B^{-1}\boldsymbol{b}'</math> and zeros. Since <math>A</math> is totally unimodular and <math>B</math> is a basis thus nonsingular, <math>\det(B)\in\{1,-1,0\}</math>. By [http://en.wikipedia.org/wiki/Cramer's_rule Cramer's rule], <math>B^{-1}</math> has integer entries, thus <math>B^{-1}\boldsymbol{b}'</math> is integral. Therefore, any basic solution of <math>A\boldsymbol{x}=\boldsymbol{b}, \boldsymbol{x}\ge \boldsymbol{0}</math> is integral, which means the polyhedron <math>\{\boldsymbol{x}\in\mathbb{R}^n\mid A\boldsymbol{x}=\boldsymbol{b}, \boldsymbol{x}\ge \boldsymbol{0}\}</math> is integral.
| |
| }}
| |
|
| |
|
| Our next result is the famous Hoffman-Kruskal theorem.
| | Due to the estimator theorem, this needs <math>\frac{4m}{\epsilon^2}\ln\frac{2}{\delta}</math> uniform random samples from <math>U</math>. |
| {{Theorem|Theorem (Hoffman-Kruskal 1956)|
| |
| :Let <math>A</math> be an <math>m\times n</math> integer matrix.
| |
| :If <math>A</math> is totally unimodualr, then for any integer vector <math>\boldsymbol{b}\in\mathbb{Z}^n</math> the polyhedron <math>\{\boldsymbol{x}\in\mathbb{R}^n\mid A\boldsymbol{x}\ge\boldsymbol{b}, \boldsymbol{x}\ge \boldsymbol{0}\}</math> is integral.
| |
| }}
| |
| {{Proof|
| |
| Let <math>A'=\begin{bmatrix}A & -I\end{bmatrix}</math>. We claim that <math>A'</math> is also totally unimodular. Any square submatrix <math>B</math> of <math>A</math> can be written in the following form after permutation:
| |
| :<math>B=\begin{bmatrix}
| |
| C & 0\\
| |
| D & I
| |
| \end{bmatrix}</math>
| |
| where <math>C</math> is a square submatrix of <math>A</math> and <math>I</math> is identity matrix. Therefore,
| |
| :<math>\det(B)=\det(C)\in\{1,-1,0\}</math>,
| |
| thus <math>A'</math> is totally unimodular.
| |
|
| |
|
| Add slack variables to transform the constraints to the standard form <math>A'\boldsymbol{z}=\boldsymbol{b},\boldsymbol{z}\ge\boldsymbol{0}</math>. The polyhedron <math>\{\boldsymbol{x}\mid A\boldsymbol{x}\ge\boldsymbol{b}, \boldsymbol{x}\ge \boldsymbol{0}\}</math> is integral if the polyhedron <math>\{\boldsymbol{z}\mid A'\boldsymbol{z}=\boldsymbol{b}, \boldsymbol{z}\ge \boldsymbol{0}\}</math> is integral, which is implied by the total unimodularity of <math>A'\,</math>.
| | This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case of it. |
| }}
| |
Parameter Estimation
Consider the following abstract problem of parameter estimation.
Let [math]\displaystyle{ U }[/math] be a finite set of known size, and let [math]\displaystyle{ G\subseteq U }[/math]. We want to estimate the parameter [math]\displaystyle{ |G| }[/math], i.e. the size of [math]\displaystyle{ G }[/math].
We assume two devices:
- A uniform sampler [math]\displaystyle{ \mathcal{U} }[/math], which uniformly and independently samples a member of [math]\displaystyle{ U }[/math] upon each calling.
- A membership oracle of [math]\displaystyle{ G }[/math], denoted [math]\displaystyle{ \mathcal{O} }[/math]. Given as the input an [math]\displaystyle{ x\in U }[/math], [math]\displaystyle{ \mathcal{O}(x) }[/math] indicates whether or not [math]\displaystyle{ x }[/math] is a member of [math]\displaystyle{ G }[/math].
Equipped by [math]\displaystyle{ \mathcal{U} }[/math] and [math]\displaystyle{ \mathcal{O} }[/math], we can have the following Monte Carlo algorithm:
- Choose [math]\displaystyle{ N }[/math] independent samples from [math]\displaystyle{ U }[/math] by the uniform sampler [math]\displaystyle{ \mathcal{U} }[/math], represented by the random variables [math]\displaystyle{ X_1,X_2,\ldots, X_N }[/math].
- Let [math]\displaystyle{ Y_i }[/math] be the indicator random variable defined as [math]\displaystyle{ Y_i=\mathcal{O}(X_i) }[/math], namely, [math]\displaystyle{ Y_i }[/math] indicates whether [math]\displaystyle{ X_i\in G }[/math].
- Define the estimator random variable
- [math]\displaystyle{ Z=\frac{|U|}{N}\sum_{i=1}^N Y_i. }[/math]
It is easy to see that [math]\displaystyle{ \mathbf{E}[Z]=|G| }[/math] and we might hope that with high probability the value of [math]\displaystyle{ Z }[/math] is close to [math]\displaystyle{ |G| }[/math]. Formally, [math]\displaystyle{ Z }[/math] is called an [math]\displaystyle{ \epsilon }[/math]-approximation of [math]\displaystyle{ |G| }[/math] if
- [math]\displaystyle{
(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
}[/math]
The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between [math]\displaystyle{ |G| }[/math] and [math]\displaystyle{ |U| }[/math]
Theorem (estimator theorem)
|
- Let [math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math]. Then the Monte Carlo method yields an [math]\displaystyle{ \epsilon }[/math]-approximation to [math]\displaystyle{ |G| }[/math] with probability at least [math]\displaystyle{ 1-\delta }[/math] provided
- [math]\displaystyle{ N\ge\frac{4}{\epsilon^2 \alpha}\ln\frac{2}{\delta} }[/math].
|
Proof.
Recall that [math]\displaystyle{ Y_i }[/math] indicates whether the [math]\displaystyle{ i }[/math]-th sample is in [math]\displaystyle{ G }[/math]. Let [math]\displaystyle{ Y=\sum_{i=1}^NY_i }[/math]. Then we have [math]\displaystyle{ Z=\frac{|U|}{N}Y }[/math], and hence the event [math]\displaystyle{ (1-\epsilon)|G|\le Z\le (1+\epsilon)|G| }[/math] is equivalent to that [math]\displaystyle{ (1-\epsilon)\frac{|G|}{|U|}N\le Y\le (1+\epsilon)\frac{|G|}{|U|}N }[/math]. Note that each [math]\displaystyle{ Y_i }[/math] is a Bernoulli trial that succeeds with probability [math]\displaystyle{ \frac{|G|}{|U|} }[/math], thus [math]\displaystyle{ \mathbb{E}[Y]=\frac{|G|}{|U|}N }[/math]. Then the rest is due to Chernoff bound.
|
- [math]\displaystyle{ \square }[/math]
|
A counting algorithm for the set [math]\displaystyle{ G }[/math] has to deal with the following three issues:
- Implement the membership oracle [math]\displaystyle{ \mathcal{O} }[/math]. This is usually straightforward, or assumed by the model.
- Implement the uniform sampler [math]\displaystyle{ \mathcal{U} }[/math]. This can be straightforward or highly nontrivial, depending on the problem.
- Deal with exponentially small [math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math]. This requires us to cleverly choose the universe [math]\displaystyle{ U }[/math]. Sometimes this needs some nontrivial ideas.
Counting DNFs
A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
- [math]\displaystyle{ (x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4) }[/math].
Note the difference from the conjunctive normal forms (CNF).
Given a DNF formular [math]\displaystyle{ \phi }[/math] as the input, the problem is to count the number of satisfying assignments of [math]\displaystyle{ \phi }[/math]. This problem is #P-complete.
Naively applying the Monte Carlo method will not give a good answer. Suppose that there are [math]\displaystyle{ n }[/math] variables. Let [math]\displaystyle{ U=\{\mathrm{true},\mathrm{false}\}^n }[/math] be the set of all truth assignments of the [math]\displaystyle{ n }[/math] variables. Let [math]\displaystyle{ G=\{x\in U\mid \phi(x)=\mathrm{true}\} }[/math] be the set of satisfying assignments for [math]\displaystyle{ \phi }[/math]. The straightforward use of Monte Carlo method samples [math]\displaystyle{ N }[/math] assignments from [math]\displaystyle{ U }[/math] and check how many of them satisfy [math]\displaystyle{ \phi }[/math]. This algorithm fails when [math]\displaystyle{ |G|/|U| }[/math] is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.
- The union of sets problem
We reformulate the DNF counting problem in a more abstract framework, called the union of sets problem.
Let [math]\displaystyle{ V }[/math] be a finite universe. We are given [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ H_1,H_2,\ldots,H_m\subseteq V }[/math]. The following assumptions hold:
- For all [math]\displaystyle{ i }[/math], [math]\displaystyle{ |H_i| }[/math] is computable in poly-time.
- It is possible to sample uniformly from each individual [math]\displaystyle{ H_i }[/math].
- For any [math]\displaystyle{ x\in V }[/math], it can be determined in poly-time whether [math]\displaystyle{ x\in H_i }[/math].
The goal is to compute the size of [math]\displaystyle{ H=\bigcup_{i=1}^m H_i }[/math].
DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula [math]\displaystyle{ \phi }[/math] is defined on [math]\displaystyle{ n }[/math] variables, and [math]\displaystyle{ \phi }[/math] contains [math]\displaystyle{ m }[/math] clauses [math]\displaystyle{ C_1,C_2,\ldots,C_m }[/math], where clause [math]\displaystyle{ C_i }[/math] has [math]\displaystyle{ k_i }[/math] literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
- [math]\displaystyle{ V }[/math] is the set of all assignments.
- Each [math]\displaystyle{ H_i }[/math] is the set of satisfying assignments for the [math]\displaystyle{ i }[/math]-th clause [math]\displaystyle{ C_i }[/math] of the DNF formular [math]\displaystyle{ \phi }[/math]. Then the union of sets [math]\displaystyle{ H=\bigcup_i H_i }[/math] gives the set of satisfying assignments for [math]\displaystyle{ \phi }[/math].
- Each clause [math]\displaystyle{ C_i }[/math] is a conjunction (AND) of literals. It is not hard to see that [math]\displaystyle{ |H_i|=2^{n-k_i} }[/math], which is efficiently computable.
- Sampling from an [math]\displaystyle{ H_i }[/math] is simple: we just fix the assignments of the [math]\displaystyle{ k_i }[/math] literals of that clause, and sample uniformly and independently the rest [math]\displaystyle{ (n-k_i) }[/math] variable assignments.
- For each assignment [math]\displaystyle{ x }[/math], it is easy to check whether it satisfies a clause [math]\displaystyle{ C_i }[/math], thus it is easy to determine whether [math]\displaystyle{ x\in H_i }[/math].
The coverage algorithm
We now introduce the coverage algorithm for the union of sets problem.
Consider the multiset [math]\displaystyle{ U }[/math] defined by
- [math]\displaystyle{ U=H_1\uplus H_2\uplus\cdots \uplus H_m }[/math],
where [math]\displaystyle{ \uplus }[/math] denotes the multiset union. It is more convenient to define [math]\displaystyle{ U }[/math] as the set
- [math]\displaystyle{ U=\{(x,i)\mid x\in H_i\} }[/math].
For each [math]\displaystyle{ x\in H }[/math], there may be more than one instances of [math]\displaystyle{ (x,i)\in U }[/math]. We can choose a unique representative among the multiple instances [math]\displaystyle{ (x,i)\in U }[/math] for the same [math]\displaystyle{ x\in H }[/math], by choosing the [math]\displaystyle{ (x,i) }[/math] with the minimum [math]\displaystyle{ i }[/math], and form a set [math]\displaystyle{ G }[/math].
Formally, [math]\displaystyle{ G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\} }[/math]. Every [math]\displaystyle{ x\in H }[/math] corresponds to a unique [math]\displaystyle{ (x,i)\in G }[/math] where [math]\displaystyle{ i }[/math] is the smallest among [math]\displaystyle{ x\in H_i }[/math].
It is obvious that [math]\displaystyle{ G\subseteq U }[/math] and
- [math]\displaystyle{ |G|=|H| }[/math].
Therefore, estimation of [math]\displaystyle{ |H| }[/math] is reduced to estimation of [math]\displaystyle{ |G| }[/math] with [math]\displaystyle{ G\subseteq U }[/math]. Then [math]\displaystyle{ |G| }[/math] can have an [math]\displaystyle{ \epsilon }[/math]-approximation with probability [math]\displaystyle{ (1-\delta) }[/math] in poly-time, if we can uniformly sample from [math]\displaystyle{ U }[/math] and [math]\displaystyle{ |G|/|U| }[/math] is suitably small.
An uniform sample from [math]\displaystyle{ U }[/math] can be implemented as follows:
- generate an [math]\displaystyle{ i\in\{1,2,\ldots,m\} }[/math] with probability [math]\displaystyle{ \frac{|H_i|}{\sum_{i=1}^m|H_i|} }[/math];
- uniformly sample an [math]\displaystyle{ x\in H_i }[/math], and return [math]\displaystyle{ (x,i) }[/math].
It is easy to see that this gives a uniform member of [math]\displaystyle{ U }[/math]. The above sampling procedure is poly-time because each [math]\displaystyle{ |H_i| }[/math] can be computed in poly-time, and sampling uniformly from each [math]\displaystyle{ H_i }[/math] is poly-time.
We now only need to lower bound the ratio
- [math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math].
We claim that
- [math]\displaystyle{ \alpha\ge\frac{1}{m} }[/math].
It is easy to see this, because each [math]\displaystyle{ x\in H }[/math] has at most [math]\displaystyle{ m }[/math] instances of [math]\displaystyle{ (x,i) }[/math] in [math]\displaystyle{ U }[/math], and we already know that [math]\displaystyle{ |G|=|H| }[/math].
Due to the estimator theorem, this needs [math]\displaystyle{ \frac{4m}{\epsilon^2}\ln\frac{2}{\delta} }[/math] uniform random samples from [math]\displaystyle{ U }[/math].
This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case of it.