comp.ita.br

Comp.ita.br

ITA – Computer Science Division – Brazil It is suggested here an algorithm based on stickers for the DNA Computing model [16] that solves the well known Bin-Packing Prob-lem (BPP), that belongs to the class N P-Hard in the strong sense,in a time bounded by O(n2q), where n is the quantity of items and qthe space requirements expressed in bits.
To the best of the authors’ knowledge, this is the first polynomial time algorithmic solution for the BPP in such a model.
The Bin Packing Problem (BPP) is a classical discrete mathematics prob-lem and imposes stern difficulties for solving it exactly in a von Neumannarchitecture based machine [11], since it also belongs to the N P-Hard in thestrong sense class [9]. However, if a different computational model is consid-ered, it is possible to solve BPP in a time bounded by a polynomial in thesize of the input. A seminal approach based on bio-inspired machine modelcame with Adleman [1], for a Hamiltonian Circuit Problem instance. In thesequence, Lipton [13] solved the SAT Problem, that is the very first prob-lem proved to be complete in N P. Those two works constitute a biologicalframework that is known as the Adleman-Lipton model. It is suggested herea polynomial time algorithm for solving the BPP that, to the best knowledgeof the authors, is the first bio-inspired solution for that problem. The pa-per is organized as follows: Section 2 gives an overview of the major resultsin DNA models of computation; the BPP and the algorithm for solving itwith its convergence are presented at Section 3. In Section 4 it is given theconclusions of the ideas here introduced.
1Corresponding author. This work was partially funded by FAPESP, grant 06/05325-1.
The Watson and Crick seminal result [17] on the information concerning thecodification of any type of life is represented in the DNA (deoxyribonucleicacid). The DNA is a long polymer formed by units called nucleotides thatconnect among themselves by four different types of molecules called bases:adenine (A), cytosine (C), guanine (G) and thymine (T ). To the context ofthis work, a nucleotide and its corresponding base are considered as the sameelement.
To form the DNA sequences, the nucleotides are joined among them by phosphate groups – bonds – that are asymmetric with respect of the geometryof each other, and they are referred as the 5’ (five prime) and 3’ (three prime)ends. The DNA double helix structure comes as a result of the annealingof complementary bases (A with T and C with G). The reverse process –melting – separate the double helix into two bases sequences. Moreover, asequence of pieces of DNA is composed by genes.
The human genome has near 3 billions base pairs of DNA. Each gene is composed from few thousands up to many millions of bases and it is located ina concrete position of a given chromosome. The bases sequences contains allthe necessary information for the protein biosynthesis, that has an importantrole in the cell division.
Under the computational point of view, a DNA sequence can be considered as a string formed by the combination of four symbols: A, G, C and T . Thisimplies that one can use four symbols to code information, being this quantitymore than sufficient, since in a usual computer just two digits are involved.
The following basic operations can be carried out on DNA sequences by • Cutting. An enzyme restriction endonuclease permits to recognize a small portion of the DNA. Any double helix that contains it can be cutin that exact place.
• Linking. An enzyme DNA ligase allows to join the end of a DNA sequence with the beginning of another one.
• Replication. An enzyme DNA polymerase makes possible the DNA • Destruction. With the enzyme exonuclease, it is possible to eliminate The term DNA Computing appeared in 1994 with Adleman [1]. A DNA- computer can be seen as a parallel SIMD (Single Instruction stream, MultipleData stream) such that their “biological processors” work synchronously, butwith no communication occurring between any two of them. An usual testtube has approximately 1018 DNA molecules and, in a DNA Kilogram, it canbe stored a quantity of information equivalent to 1023 bytes. Therefore, evenbeing near 109 times slower than their electronic computers counterparts,the sheer amount of processing cells, allied with the series of operationspossible to be carried out over the molecules, can configure in a near futurea feasible alternative to efficiently solve many computational problems, eventhe intractable ones.
In the last few years, a series of articles on both the codification and solution of difficult problems via DNA Computing appeared in the literature:Maximal Clique [15]; 3-SAT [13]; Subgraph Isomorphism [12], 3-Coloring,Maximal Clique and Independent-Set [2]; Dominating-Set [10]; 3-DimensionalMatching and Set-Packing [3]; Set-Covering [4]; Subset-Sum [5]; Knapsack[6, 7, 14]; 3-SAT, 3-Coloring and Independent-Set [8].
In spite of the fact that all of these results are theoretical ones, without the laboratory practical tests, it is possible to prove that the total number ofbiological operations necessary to solve those problems is bounded by a poly-nomial in terms of processing time. This indicates that this computationalmodel maybe seen as an efficient alternative to solve problems belonging tothe N P class.
The results in Adleman [1] and Lipton [13], as mentioned before, compose asystem of biological operations that are known as the Adleman-Lipton model.
In this model, there exists a set of test tubes with DNA’s sequences, i.e., a multi-set of finite strings over the alphabet {A, C, G, T }. The biologicaloperations within this model, that form the basic steps of a DNA-computer ,are defined by the following: • Extract. Given a test tube T and a sequence S, this operation gener- ates two tubes: +(T, S) with all the sequences in T that had S as asubsequence, and −(T, S) with the remaining sequences of T .
• Merge. Given two test tubes T1 and T2, this operation generates a new tube with the content of both, that is, it is equivalent to decant the contents of the T1 and T2 into a third one without modifying nomolecule.
• Detect. Given a tube T , this operation returns the logic value yes if there is at least a DNA molecule in it, and no otherwise.
• Discard. Given a test tube T , this operation simply discards it.
• Amplify. Given a test tube T , amplif y(T, T1, T2) produces two new test tubes T1 and T2 that are identical copies of T , and this latter onebecomes empty.
• Append. Given a test tube T and a sequence S, append(T, S) affixes S One of the most important problems with these biological operations is the high error rate when of their executions. A large variety of suggestionsappeared to overcome such type of error and the most relevant one comeswith the use of stickers [16], summarized next.
The sticker-based model [16] simulates a computer with a binary memory inthe following way: • There is a new way to present the information: each bit corresponds to a subsequence of k nucleotides, called a sticker.
• The computer memory is formed by a sequence of stickers: those that do have a match can be associated to a bit with the value 1. If such amatching does not occur, then the corresponding bit receives the value0.
• Each sticker differs significantly from the others, and it exactly matches Due to the concrete way in which the stickers possess their nucleotides order it is possible to sensibly diminish the DNA’s handling errors, where themost common one is the hairpin (subsequences matches occurring within asame molecule). In that same article, the authors presented the feasibility oftheir results via a series of experiments obtained in the practice by the fourbiological operations, described next: • Combine: Given two test tubes T1 and T2 filled with DNA´s, this oper- ation gives a third test tube T3 with a union of the first two (eventuallythis tube might be even one of the previous one). The operation cor-responds to a merge to the original model.
• Separate: Given a test tube T and a given bit in position i, this opera- tion separates DNA’s sequences into two groups: in test tube T1, thosewith a value 1 in that position, and in test tube T0 the remaining ones.
Moreover, the entire content of T is discarded.
• Set: Given a test tube T and a particular bit at position i, all the DNA’s sequences receive a sticker match that corresponds to that bit.
• Clear: Given a test tube T and a particular bit at position i, all the DNA’s sequences with eventual stickers are removed in such a way thatthere is no match at that position.
Some recent publications [4, 5, 10] suggested solution procedures based onAdleman-Lipton model for some N P-Complete problems with variations inrelation to the original model. That is, the stickers were used to representthe bits, but without using the specific original inspired biological operations:separate, set and clear. In reality, those authors proceeded to an adaptationof the original model: the bits are not represented by stickers matching butby nucleotides sequences. In other words, to those approaches all stickers arematched with specific stickers to represent the binary values. This indicatesthat those types of approaches can generate hybrid models and they leaveopen further possibilities for practical implementations.
The algorithm introduced here is based on the original formulation of stickers[16], but it is important to note that it also uses the amplif y, append and detect biological operations of the Adleman-Lipton model. Moreover, to thebest knowledge of the authors, it is not know any solution for the BPP basedon DNA Computing.
The classical one-dimensional BPP – that is known to be N P-Hard in the strong sense – is as given a set of n items a0, a1, . . . , an−1 with their respectiveweights w(ai) = xi ∈ (0, c], 0 ≤ i < n. The aim is to allocate all items intobins with equal capacity c and by using a minimum number of them. It isassumed that no packing of items can surpass the bin capacity c and it isavailable in principle n empty bins.
Admitting that the numerical values for the problem are represented by q bits – or equivalently in the DNA model by a sequence of q stickers –, it ispossible to design a brute-force algorithms that solve the BPP in polynomialtime. The basic idea is to generate all possible solution, one to each DNAsequence, eliminate the infeasible ones and then to choose the best one.
Albeit simple and obvious as the ideas may seem, their actual algorithmicimplementations are very elaborate and non-trivial.
Before the presentation of the actual code of the biological operations based algorithm, three procedures, incr(), add() and compare() are necessaryto be described in order to show how the DNA’s sequences are modified whena test tube T is passed as a parameter. The comments within the proceduresfollows the C-Language notation, i.e., they are represented by “//”.
Procedure incr(T, b) adds a unit in the binary number from the bth bit in the DNA sequences that are in T . As the algorithm is executed, test tubeTF stores the sequences that have already been incremented. By the end ofthe procedure, T receives back all the sequences.
//Increments the number stored from the bit b onwards.
incr(T, b) (T0, T1) ← (T, b + i)set(T0, b + i)TF ← TF ∪ T0clear(T1, b + i)T ← T1i ← i − 1 until (i = −1) or (detect(T ) = no)T ← TF It is easy to observe that the time complexity of this procedure is O(q).
Procedure add(T, b, X) adds the value X from bit b up to the end of sequencesin T . The operation is carried out in a bit-to-bit conventional manner. Ateach step of the loop, TC and T ¯ C do respectively have the sequences that a carry will be considered or not in the next bit addition. X(i) means the ithbit in the binary representation of number X.
//Adds value X to the already stored number from bit b onwards.
add(T, b, X) clear(TC1, b + i)set(TC0, b + 1)TC ← TC1T ¯ ← The time complexity is clearly O(l).
Finally, the third procedure divides the subsequences in accordance with thevalues of the represented numbers. Whenever compare(T, bA, bB, T1, T2, T3)is invoked, test tubes T1, T2 and T3 will receive respectively sequences inwhich A < B, A = B or A > B, where A and B are the numbers storedfrom the bits bA and bB onwards.
// Compare the numbers A and B, starting from the bits bA e bB// T1: store sequences such that A < B// T2: store sequences such that A = B// T3: store sequences such that A > Bcompare(T, bA, bB, T1, T2, T3) (TA0, TA1) ← (T, bA + i)(TA1B0, TA1B1) ← (TA1, bB + i)(TA0B0, TA0B1) ← (TA0, bB + i)T3 ← T3 ∪ TA1B0T1 ← T1 ∪ TA0B1T ← TA0B0 ∪ TA1B1i ← i + 1 until (i = q) or (detect(T ) = no)T2 ← T Again, as to the previous procedures, this one also has a time complexity of O(q). The definitions of these three procedures given above were necessary tobe done since they are used in the DNA Computing based codification of thealgorithm for solving the BPP and that is presented next. For sake of clarity,it is assumed that seq(i) is the subsequence of q stickers that represents thenumber i.
The algorithm for solving the BPP generates all nn combinations of weightsand then the best solution is chosen. It is also necessary to discard unfeasiblecombinations of items, that is, those that possess repetition of the same item.
These ideas are presented in the next six phases.
(a) Generation of the nn combination of all values between 0 and n−1, that will correspond to the eventual solutions for the problem. To each andevery combination, the value j at the ith position means the allocationof weight xi at the jth bin, where 0 ≤ i, j < n.
(b) At the end of each one of these combinations, it is appended a se- quence of the numbers 0, 1, 2, . . . , n − 1, c that will be used in futurecalculations.
(c) Evaluate and append at the end of the previous sequence the n accu- (d) Discard the unfeasible combinations, that is, those whose total weight (e) For the remaining solutions it is counted the number of bins that are (f) Find the best solution, i.e., the one with the minimum number of empty In terms of quantity of information demanded by the algorithm, every DNA sequence will have (3n + 2)q stickers. The 3n + 2 numbers with q bitscomes from the following sequence: • n values between 0 and n − 1 (to represent a permutation); • the number 0, 1, 2, . . . , n − 1, c; • the n accumulated totals in every bin; • a value between 0 and n (to represent the number of empty bins).
In what follows, it is presented how the six phases given previously solve the BPP by using a DNA-computer. T is a test tube with the DNA’s se-quences.
//Algorithm for solving the BPPbin-packing(T ) generate(T )numbers(T )weights(T )prune(T )empty(T )find(T ) (a) Generation of all nn possible solutions.
algorithm given below, n copies of the already existing sequences are createdand, at the end of each one of them, it is appended as a suffix a distinct valuewhose range is between 0 and n − 1.
amplif y(T0, T2j−1, T2j)// Generate n copies Ti, 0 < i ≤ n //seq(i) at the end of Ti+1, 0 ≤ i < n The time complexity is clearly bounded by above by O(n2q).
(b) Addition of the numbers 0, 1, 2, . . . , n − 1 and of the limit c.
this phase, it is necessary to use n + 1 times the operation append.
(c) Determination of the resulting weights for each recipient.
determination is evaluated through a loop for each recipient: first, the value0 is appended as a suffix of each sequence; then, it is verified which items are stored in that specific recipient by using procedure compare, and theirweights are added if this is the case.
append(T, seq(0))for j ← 0 to n − 1 do compare(T, qj, q(n + i), T1, T2, T3)add(T2, q(2n + 1 + j), xi)T ← T1 ∪ T2T ← T ∪ T3 (d) The unfeasible solutions discard.
sequences for which at least a recipient stores a value greater than c.
compare(T, 2nq, q(2n + 1 + i), T1, T2, T3)T ← T2 ∪ T3 The time complexity is clearly bounded by O(n).
value 0 at the end of all sequences. In the for loop, a counting processis carried out: sequences that are not modified are counted as an emptyrecipient, and this is equivalent to determining bins with no items allocatedto them.
compare(T, nq, q(2n + 1 + i), T1, T2, T3)incr(T2, q(3n + 1))T ← T1 ∪ T2T ← T ∪ T3 sequence with the largest quantity of empty recipients.
compare(T, q(n + i), q(3n + 1), T1, T2, T3)if detect(T2) =no then The time complexities of all procedures of the suggested algorithm are sum-marized in the next table.
From the partial complexities of each phase, it is possible to determine that the one-dimensional BPP, solved by stickers and biological operationsdefined in the Adleman-Lipton model, is bounded by O(n2q) in time, wheren is the quantity of items and q is the number of bits necessary to store thenumerical values that are associated to the stickers. Since the value of qis the quantity of bits and it is not a unary representation of the involvedcoefficients, the algorithm is polynomial in time cf. [9].
It was presented an algorithm for solving the well known BPP in a polynomialtime on the quantity of weights. The problem is N P-Hard in the strongsense and there is no known exact algorithm with that time bound in a vonNeumann architecture or in a theoretical setting to a non-equivalent Non-Deterministic Turing Machine. A hybrid model that combines the Adleman-Lipton and stickers features was also given, with all the basic bio-inspiredoperations been defined too. It is important to note that the space is boundedby nn DNA sequences and, depending on the value of n, this would be animpeditive factor for its practical implementation, even considering the veryhuge amount indeed of molecules (equivalent to the computer memory) ina single test tube. Another important observation that deserves mentionis that the suggested approach can be viewed as a general setting to solvein quadratic time any problem whose solution depends on permutations.
This can be accomplished by changing the corresponding objective function,whenever it can be evaluated also in polynomial-time, accordingly.
[1] L. Adleman, “Molecular computation of solutions to combinatorial prob- lems”, Science, 266, pp. 1021-1024, 1994.
[2] M. Amos, “DNA Computation”, Ph.D. Thesis, Department of Com- puter Science, The University of Warwick, 1997.
[3] W.-L. Chang and M. Guo, “Resolving the 3-dimensional matching prob- lem and the set-packing problem in Adleman-Lipton’s model”, IASTED International Conference, Networks, Parallel and Distributed Processingand Applications, Japan, pp. 455-460, 2002.
[4] W.-L. Chang and M. Guo, “Solving the set-cover problem and the prob- lem of exact cover by 3-sets in the Adleman-Lipton’s model”, BioSys-tems, 72, 263-275, 2003.
[5] W.-L. Chang, M. Ho and M. Guo, “Molecular solution for the subset- sum problem on DNA-based supercomputing”, BioSystems, 73, pp. 117-130, 2004.
[6] M. Darehmiraki and H. M. Nehi, “Molecular solution to the 0-1 knapsack problem based on DNA computing”, Applied Mathematics and Compu-tation, 187, pp. 1033-1037, 2007.
[7] M. Darehmiraki and H. M. Nehi, “A surface-based DNA algorithm for the solving binary knapsack problem”, Applied Mathematics and Com-putation, 188, pp. 1991-1994, 2007.
[8] B. Fu, “Volume bounded molecular computation”, Ph.D. Thesis, De- partment of Computer Science, Yale University, 1997.
[9] R. Garey and D.S. Johnson, Computers and Intractability: A guide to the theory of NP-Completeness, Freeman, New York, 1979.
[10] M. Guo, M. Ho and W.-L. Chang, “Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing”,Parallel Computing, 30, pp. 1109-1125, 2004.
[11] J.L. Hennessy, D.A. Patterson and D. Goldberg, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, 2002.
[12] S.-Y. Hsieh, C.-W. Huang and H.-H. Chou, A DNA-based graph en- coding scheme with its applications to graph isomorphism problems,Applied Mathematics and Computation, 203, pp. 502-512, 2008.
[13] R.J. Lipton, “DNA solutions of hard computational problems”, Science, enez, F. Sancho-Caparrini, “Solving knapsack problems in a sticker-based model”, Lecture Notes in Computer Science, 2340, pp.
161-171, 2002.
[15] Q. Quyang, P.D. Kaplan, S. Liu, A. Libchaber, “DNA solution of the maximal clique problem”, Science, 278, pp. 446-449, 1997.
[16] S. Roweis, E. Winfree, R. Burgoyne, N. Chelyapov, M. Goodman, P.
Rothemund, L. Adleman, “A sticker-based model for DNA Computa-tion”, Journal of Computational Biology, 5, pp. 615-629, 1998.
[17] J. Watson and F. Crick, “Molecular structure of nucleic acids; a struc- ture for deoxyribose nucleic acid”, Nature, 171 (4356), pp. 737-8, 1953.

Source: http://www.comp.ita.br/~alonso/public/amc09.pdf

Layout

Congratulations on having registered in The Landmark Forum. The Landmark Forum is an enquiry into one’s relationship to life—to one’s self, family, teachers, school and peers. The Landmark Forum is designed as an opportunity for people to be more powerful, freely expressed and effective in dealing with life. • PARTICIPANTS AND PARENTS: Each one of you will have sections of this form t

Microsoft word - 130410 carantion barosso.doc

STICHTING TER VOORKOMING MISBRUIK GENETISCHE MANIPULATIE (VoMiGEN) Foundation for the prevention of Abuse of Genetic Manipulation (VoMiGEN) Van Speykstraat 87 - 3014 VE Rotterdam. (Niederlande) www.wirsinduberall.de und www.vomigen.eu www.gentechvrij.nl e-mail: vomigen@vomigen.eu KvK te Rotterdam onder nr. 24290161 The “demos” of Europe say no to GMO. Carnation SNIF C/NL/13/01: