SlideShare a Scribd company logo
1 of 29
Download to read offline
Algorithms on Strings
Michael Soltys
CSU Channel Islands
Computer Science
February 4, 2015
Strings - Soltys Math/CS Seminar Title - 1/27
String problems are at the heart of Computer Science:
Rewriting systems are Turing complete
In practice analysis of strings is central to:
Algorithmic biology
Text processing
Language theory
Coding theory
Strings - Soltys Math/CS Seminar Introduction - 2/27
Basics (COMP 454)
An alphabet is a finite, non-empty set of distinct symbols, denoted
usually by Σ.
e.g., Σ = {0, 1} (binary alphabet)
Σ = {a, b, c, . . . , z} (lower-case letters alphabet)
A string, also called word, is a finite ordered sequence of symbols
chosen from some alphabet.
e.g., 010011101011
|w| denotes the length of the string w.
e.g., |010011101011| = 12
The empty string, ε, |ε| = 0, is in any Σ by default.
Strings - Soltys Math/CS Seminar Introduction - 3/27
Σk is the set of strings over Σ of length exactly k.
e.g., If Σ = {0, 1}, then
Σ0
= {ε}
Σ1
= Σ
Σ2
= {00, 01, 10, 11}, etc. |Σk
|?
Kleene’s star Σ∗ is the set of all strings over Σ.
Σ∗ = Σ0 ∪ Σ1
∪ Σ2
∪ Σ3
∪ . . .
=Σ+
Concatenation If x, y are strings, and x = a1a2 . . . am &
y = b1b2 . . . bn ⇒ x · y = xy
juxtaposition
= a1a2 . . . amb1b2 . . . bn
UNIX cat command
Strings - Soltys Math/CS Seminar Introduction - 4/27
A language L is a collection of strings over some alphabet Σ, i.e.,
L ⊆ Σ∗. E.g.,
L = {ε, 01, 0011, 000111, . . .} = {0n
1n
|n ≥ 0} (1)
Note:
wε = εw = w.
{ε} = ∅; one is the language consisting of the single string ε,
and the other is the empty language.
Strings - Soltys Math/CS Seminar Introduction - 5/27
Consider L = {w| w is of the form x01y ∈ Σ∗ } where Σ = {0, 1}.
We want to specify a DFA A = (Q, Σ, δ, q0, F) that accepts all and
only the strings in L.
Σ = {0, 1}, Q = {q0, q1, q2}, and F = {q1}.
Transition diagram
q
1 0 0,1
10
q0 q2 1
Transition table
0 1
q0 q2 q0
q1 q1 q1
q2 q2 q1
Strings - Soltys Math/CS Seminar Introduction - 6/27
A context-free grammar (CFG) is G = (V , T, P, S) — Variables,
Terminals, Productions, Start variable
Ex. P −→ ε|0|1|0P0|1P1.
Ex. G = ({E, I}, T, P, E) where T = {+, ∗, (, ), a, b, 0, 1} and P is
the following set of productions:
E −→ I|E + E|E ∗ E|(E)
I −→ a|b|Ia|Ib|I0|I1
If αAβ ∈ (V ∪ T)∗, A ∈ V , and A −→ γ is a production, then
αAβ ⇒ αγβ. We use
∗
⇒ to denote 0 or more steps.
L(G) = {w ∈ T∗|S
∗
⇒ w}
Strings - Soltys Math/CS Seminar Introduction - 7/27
Context-sensitive grammars (CSG) have rules of the form:
α → β
where α, β ∈ (T ∪ V )∗ and |α| ≤ |β|. A language is context
sensitive if it has a CSG.
Fact: It turns out that CSL = NTIME(n)
A rewriting system (also called a Semi-Thue system) is a grammar
where there are no restrictions; α → β for arbitrary
α, β ∈ (V ∪ T)∗.
Fact: It turns out that a rewriting system corresponds to the most
general model of computation; i.e., a language has a rewriting
system iff it is “computable.”
Strings - Soltys Math/CS Seminar Introduction - 8/27
A second course in Automata
Chomsky-Schutzenberger Theorem: If L is a CFL, then there
exists a regular language R, an n, and a homomorphism h, such
that L = h(PARENn ∩ R).
Parikh’s Theorem: If Σ = {a1, a2, . . . , an}, the signature of a
string x ∈ Σ∗ is (#a1(x), #a2(x), . . . , #an(x)), i.e., the number of
ocurrences of each symbol, in a fixed order. The signature of a
language is defined by extension; regular and CFLs have the same
signatures.
Strings - Soltys Math/CS Seminar Introduction - 9/27
This presentation is about algorithms on strings.
Based on two papers that are coming out in the next months:
Neerja Mhaskar and Michael Soltys
Non-repetitive strings over alphabet lists
to appear in WALCOM, February 2015.
Neerja Mhaskar and Michael Soltys
String Shuffle: Circuits and Graphs
accepted in the Journal of Discrete Algorithms, 2015
Both at http://soltys.cs.csuci.edu (papers 3 & 19)
Strings - Soltys Math/CS Seminar Introduction - 10/27
Non-repetitive strings
A word is non-repetitive if it does not contain a subword of the
form vv.
Word with repetition 010101110
Word without repetition 101
Easy observation: what is the smallest n so that any word over
Σ = {0, 1} of length ≥ n has at least one repetition?
Strings - Soltys Math/CS Seminar Non-repetitive strings - 11/27
Original Thue problem
For Σ3 = {1, 2, 3} and morphism, due to A. Thue:
S =



1 → 12312
2 → 131232
3 → 1323132
Given a string w ∈ Σ∗
3, we let S(w) denote w with every symbol
replaced by its corresponding substitution:
S(w) = S(w1w2 . . . wn) = S(w1)S(w2) . . . S(wn)
Lemma: If w is non-repetitive then so is S(w).
Strings - Soltys Math/CS Seminar Non-repetitive strings - 12/27
Problem extended to alphabet lists
List of alphabets L = L1, L2, . . . , Ln
Can we generate non-repetitive words
w = w1w2 . . . wn, such that the symbol wi ∈ Li ?
Studied by: [GKM10], [Sha09], and it is a natural extension of the
original problem posed and solved by A. Thue.
E.g., L1 = {a, b, c}, L2 = {b, c, d}, L3 = {a, d, 2}, in this case
w = ac2 is over L1, L2, L3 and non-repetitive.
Is that true for any list where |Li | = 3 for all i?
Strings - Soltys Math/CS Seminar Non-repetitive strings - 13/27
[GKM10] shows that this can be done for |Li | = 4 for all i with this
algorithm:
pick any w1 ∈ L1
for i + 1 (w = w1w2 . . . wi is non-repetitive) pick a ∈ Li+1
if wa is non-repetitive, then let wi+1 = a
if wa has a square vv, then
vv must be a suffix
delete the right copy of v from w, and restart.
Using sophisticated Lov´asz Local Lemma argument and Catalan
numbers we can show that the above algorithm succeeds with
non-zero probability.
Strings - Soltys Math/CS Seminar Non-repetitive strings - 14/27
Particular “yes” cases for L1, L2, . . . , Ln
Has a system of distinct representatives (SDR)
Has the union property
Can be mapped consistently to Σ3 = {1, 2, 3}
It is a partition
Strings - Soltys Math/CS Seminar Non-repetitive strings - 15/27
Open Problem 1
Given any list L1, L2, . . . , Ln, where |Li | = 3, can we always find a
non-repetitive string w over such a list?
Strings - Soltys Math/CS Seminar Non-repetitive strings - 16/27
Shuffle
w is the shuffle of u, v: w = u v
w = 0110110011101000
u = 01101110
v = 10101000
w = 0110110011101000
Strings - Soltys Math/CS Seminar Shuffle - 17/27
Shuffle
w is the shuffle of u, v: w = u v
w = 0110110011101000
u = 01101110
v = 10101000
w = 0110110011101000
w is a shuffle of u and v provided:
u = x1x2 · · · xk
v = y1y2 · · · yk
and w obtained by “interleaving” w = x1y1x2y2 · · · xkyk.
Strings - Soltys Math/CS Seminar Shuffle - 17/27
Square Shuffle
w is a square provided it is equal to a shuffle of a u with itself, i.e.,
∃u s.t. w = u u
The string w = 0110110011101000 is a square:
w = 0110110011101000
and
u = 01101100 = 01101100
Strings - Soltys Math/CS Seminar Shuffle - 18/27
Result from 2013
given an alphabet Σ, |Σ| ≥ 7,
Square = {w : ∃u(w = u u)}
is NP-complete.
Strings - Soltys Math/CS Seminar Shuffle - 19/27
Result from 2013
given an alphabet Σ, |Σ| ≥ 7,
Square = {w : ∃u(w = u u)}
is NP-complete.
What we leave open:
What about |Σ| = 2 (for |Σ| = 1, Square is just the set of
even length strings)
What about if |Σ| = ∞ but each symbol cannot occur more
often than, say, 6 times (if each symbol occurs at most 4
times, Square can be reduced to 2-Sat – see P. Austrin
Stack Exchange post http://bit.ly/WATco3)
Strings - Soltys Math/CS Seminar Shuffle - 19/27
Open Problem 2
Is Square NP-complete for alphabets of size {2, 3, 4, 5, 6} ?
Strings - Soltys Math/CS Seminar Shuffle - 20/27
Upper and lower bounds
Shuffle(x, y, w) holds if and only if w is a shuffle of x, y
Shuffle ∈ AC0
, but Shuffle ∈ AC1
.
Strings - Soltys Math/CS Seminar Shuffle - 21/27
Upper bound
Strings - Soltys Math/CS Seminar Shuffle - 22/27
Lower bound
Parity(x) =
0 ≤ i ≤ |x|
i is odd
Shuffle(0|x|−i
, 1i
, x).
Strings - Soltys Math/CS Seminar Shuffle - 23/27
n−i
i=1 i=3 i=5 i=n
0 x 1 1 10 0 0x x x1
ii n−i i in−i n−i
Strings - Soltys Math/CS Seminar Shuffle - 24/27
Open Problem 3
Is Shuffle in NC1
?
Strings - Soltys Math/CS Seminar Shuffle - 25/27
Announcement of two upcoming seminars
1. February 16, 2015, 6:00-7:00pm
Bell Tower 1471
Ryszard Janicki
On Pairwise Comparisons Based Rankings
2. February 16, 2015, 7:00-8:00pm
Bell Tower 1471
Neerja Mhaskar
Repetition in Strings and String Shuffles
Computer Science Seminars:
http://compsci.csuci.edu/degrees/seminars.htm
Strings - Soltys Math/CS Seminar Conclusion - 26/27
References
Jaroslaw Grytczuk, Jakub Kozik, and Pitor Micek.
A new approach to nonrepetitive sequences.
arXiv:1103.3809, December 2010.
Jeffrey Shallit.
A second course in formal languages and automata theory.
Cambridge Univeristy Press, 2009.
Strings - Soltys Math/CS Seminar References - 27/27

More Related Content

What's hot

January 16, 2014
January 16, 2014January 16, 2014
January 16, 2014khyps13
 
Vector Space & Sub Space Presentation
Vector Space & Sub Space PresentationVector Space & Sub Space Presentation
Vector Space & Sub Space PresentationSufianMehmood2
 
How to solve recurrence equation
How to solve recurrence equationHow to solve recurrence equation
How to solve recurrence equationHanpenRobot
 
Distance of a point from a line
Distance of a point from a lineDistance of a point from a line
Distance of a point from a linesumanmathews
 
MATRICES
MATRICESMATRICES
MATRICESdaferro
 
February 16 2016
February 16 2016February 16 2016
February 16 2016khyps13
 
Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...
Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...
Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...Kasun Ranga Wijeweera
 
A Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. Thesis
A Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. ThesisA Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. Thesis
A Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. ThesisIbrahim Hamad
 
Linear function and slopes of a line
Linear function and slopes of a lineLinear function and slopes of a line
Linear function and slopes of a lineJerlyn Fernandez
 
Graphs linear equations and functions
Graphs linear equations and functionsGraphs linear equations and functions
Graphs linear equations and functionsanwarsalamappt
 
M1 L5 Remediation Notes
M1 L5 Remediation NotesM1 L5 Remediation Notes
M1 L5 Remediation Notestoni dimella
 
Definition Vector space
Definition Vector spaceDefinition Vector space
Definition Vector spaceROHAN GAIKWAD
 

What's hot (20)

January 16, 2014
January 16, 2014January 16, 2014
January 16, 2014
 
Vector spaces
Vector spaces Vector spaces
Vector spaces
 
Unit 3 review
Unit 3 reviewUnit 3 review
Unit 3 review
 
Vector Space & Sub Space Presentation
Vector Space & Sub Space PresentationVector Space & Sub Space Presentation
Vector Space & Sub Space Presentation
 
Math10 q1 week2.arithmetic sequence1
Math10 q1 week2.arithmetic sequence1Math10 q1 week2.arithmetic sequence1
Math10 q1 week2.arithmetic sequence1
 
TOC 3 | Different Operations on DFA
TOC 3 | Different Operations on DFATOC 3 | Different Operations on DFA
TOC 3 | Different Operations on DFA
 
Writing linear equations
Writing linear equationsWriting linear equations
Writing linear equations
 
How to solve recurrence equation
How to solve recurrence equationHow to solve recurrence equation
How to solve recurrence equation
 
Distance of a point from a line
Distance of a point from a lineDistance of a point from a line
Distance of a point from a line
 
MATRICES
MATRICESMATRICES
MATRICES
 
matricesMrtices
matricesMrticesmatricesMrtices
matricesMrtices
 
February 16 2016
February 16 2016February 16 2016
February 16 2016
 
Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...
Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...
Convex Partitioning of a Polygon into Smaller Number of Pieces with Lowest Me...
 
A Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. Thesis
A Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. ThesisA Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. Thesis
A Nonstandard Study of Taylor Ser.Dev.-Abstract+ Intro. M.Sc. Thesis
 
MB1105 QR
MB1105 QRMB1105 QR
MB1105 QR
 
Linear function and slopes of a line
Linear function and slopes of a lineLinear function and slopes of a line
Linear function and slopes of a line
 
Axioms, postulates
Axioms, postulatesAxioms, postulates
Axioms, postulates
 
Graphs linear equations and functions
Graphs linear equations and functionsGraphs linear equations and functions
Graphs linear equations and functions
 
M1 L5 Remediation Notes
M1 L5 Remediation NotesM1 L5 Remediation Notes
M1 L5 Remediation Notes
 
Definition Vector space
Definition Vector spaceDefinition Vector space
Definition Vector space
 

Viewers also liked

The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006Michael Soltys
 
Feasible Combinatorial Matrix Theory - LICS2013 presentation
Feasible Combinatorial Matrix Theory - LICS2013 presentationFeasible Combinatorial Matrix Theory - LICS2013 presentation
Feasible Combinatorial Matrix Theory - LICS2013 presentationMichael Soltys
 
La, permutations, and the hajós calculus - ICALP 2004
La, permutations, and the hajós calculus - ICALP 2004La, permutations, and the hajós calculus - ICALP 2004
La, permutations, and the hajós calculus - ICALP 2004Michael Soltys
 
A formal framework for Stringology
A formal framework for StringologyA formal framework for Stringology
A formal framework for StringologyMichael Soltys
 
Fair ranking in competitive bidding procurement: a case analysis
Fair ranking in competitive bidding procurement: a case analysisFair ranking in competitive bidding procurement: a case analysis
Fair ranking in competitive bidding procurement: a case analysisMichael Soltys
 
The proof theoretic strength of the Steinitz exchange theorem - EACA 2006
The proof theoretic strength of the Steinitz exchange theorem - EACA 2006The proof theoretic strength of the Steinitz exchange theorem - EACA 2006
The proof theoretic strength of the Steinitz exchange theorem - EACA 2006Michael Soltys
 
Unambiguous functions in logarithmic space - CiE 2009
Unambiguous functions in logarithmic space - CiE 2009Unambiguous functions in logarithmic space - CiE 2009
Unambiguous functions in logarithmic space - CiE 2009Michael Soltys
 
Feasible Combinatorial Matrix Theory - MFCS2013
Feasible Combinatorial Matrix Theory - MFCS2013Feasible Combinatorial Matrix Theory - MFCS2013
Feasible Combinatorial Matrix Theory - MFCS2013Michael Soltys
 
The Proof Complexity of Linear Algebra - LICS 2002
The Proof Complexity of Linear Algebra - LICS 2002The Proof Complexity of Linear Algebra - LICS 2002
The Proof Complexity of Linear Algebra - LICS 2002Michael Soltys
 
Perceptions of Foundational Knowledge by CS students - WCCCE 2012
Perceptions of Foundational Knowledge by CS students - WCCCE 2012Perceptions of Foundational Knowledge by CS students - WCCCE 2012
Perceptions of Foundational Knowledge by CS students - WCCCE 2012Michael Soltys
 
Games on Posets - CiE 2008
Games on Posets - CiE 2008Games on Posets - CiE 2008
Games on Posets - CiE 2008Michael Soltys
 
Boolean Programs and Quantified Propositional Proof System -
Boolean Programs and Quantified Propositional Proof System - Boolean Programs and Quantified Propositional Proof System -
Boolean Programs and Quantified Propositional Proof System - Michael Soltys
 
How Safe is your Data?
How Safe is your Data?How Safe is your Data?
How Safe is your Data?Michael Soltys
 
Circuit Complexity of Shuffle - IWOCA 2013
Circuit Complexity of Shuffle - IWOCA 2013Circuit Complexity of Shuffle - IWOCA 2013
Circuit Complexity of Shuffle - IWOCA 2013Michael Soltys
 
An algorithmic view of Computer Science
An algorithmic view of Computer ScienceAn algorithmic view of Computer Science
An algorithmic view of Computer ScienceMichael Soltys
 
Feasible proofs of matrix identities with csanky's algorithm - CSL 2005
Feasible proofs of matrix identities with csanky's algorithm - CSL 2005Feasible proofs of matrix identities with csanky's algorithm - CSL 2005
Feasible proofs of matrix identities with csanky's algorithm - CSL 2005Michael Soltys
 

Viewers also liked (17)

Intro to Cryptography
Intro to CryptographyIntro to Cryptography
Intro to Cryptography
 
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
 
Feasible Combinatorial Matrix Theory - LICS2013 presentation
Feasible Combinatorial Matrix Theory - LICS2013 presentationFeasible Combinatorial Matrix Theory - LICS2013 presentation
Feasible Combinatorial Matrix Theory - LICS2013 presentation
 
La, permutations, and the hajós calculus - ICALP 2004
La, permutations, and the hajós calculus - ICALP 2004La, permutations, and the hajós calculus - ICALP 2004
La, permutations, and the hajós calculus - ICALP 2004
 
A formal framework for Stringology
A formal framework for StringologyA formal framework for Stringology
A formal framework for Stringology
 
Fair ranking in competitive bidding procurement: a case analysis
Fair ranking in competitive bidding procurement: a case analysisFair ranking in competitive bidding procurement: a case analysis
Fair ranking in competitive bidding procurement: a case analysis
 
The proof theoretic strength of the Steinitz exchange theorem - EACA 2006
The proof theoretic strength of the Steinitz exchange theorem - EACA 2006The proof theoretic strength of the Steinitz exchange theorem - EACA 2006
The proof theoretic strength of the Steinitz exchange theorem - EACA 2006
 
Unambiguous functions in logarithmic space - CiE 2009
Unambiguous functions in logarithmic space - CiE 2009Unambiguous functions in logarithmic space - CiE 2009
Unambiguous functions in logarithmic space - CiE 2009
 
Feasible Combinatorial Matrix Theory - MFCS2013
Feasible Combinatorial Matrix Theory - MFCS2013Feasible Combinatorial Matrix Theory - MFCS2013
Feasible Combinatorial Matrix Theory - MFCS2013
 
The Proof Complexity of Linear Algebra - LICS 2002
The Proof Complexity of Linear Algebra - LICS 2002The Proof Complexity of Linear Algebra - LICS 2002
The Proof Complexity of Linear Algebra - LICS 2002
 
Perceptions of Foundational Knowledge by CS students - WCCCE 2012
Perceptions of Foundational Knowledge by CS students - WCCCE 2012Perceptions of Foundational Knowledge by CS students - WCCCE 2012
Perceptions of Foundational Knowledge by CS students - WCCCE 2012
 
Games on Posets - CiE 2008
Games on Posets - CiE 2008Games on Posets - CiE 2008
Games on Posets - CiE 2008
 
Boolean Programs and Quantified Propositional Proof System -
Boolean Programs and Quantified Propositional Proof System - Boolean Programs and Quantified Propositional Proof System -
Boolean Programs and Quantified Propositional Proof System -
 
How Safe is your Data?
How Safe is your Data?How Safe is your Data?
How Safe is your Data?
 
Circuit Complexity of Shuffle - IWOCA 2013
Circuit Complexity of Shuffle - IWOCA 2013Circuit Complexity of Shuffle - IWOCA 2013
Circuit Complexity of Shuffle - IWOCA 2013
 
An algorithmic view of Computer Science
An algorithmic view of Computer ScienceAn algorithmic view of Computer Science
An algorithmic view of Computer Science
 
Feasible proofs of matrix identities with csanky's algorithm - CSL 2005
Feasible proofs of matrix identities with csanky's algorithm - CSL 2005Feasible proofs of matrix identities with csanky's algorithm - CSL 2005
Feasible proofs of matrix identities with csanky's algorithm - CSL 2005
 

Similar to Algorithms on Strings

Similar to Algorithms on Strings (20)

Theory of computation
Theory of computationTheory of computation
Theory of computation
 
Automata
AutomataAutomata
Automata
 
Automata
AutomataAutomata
Automata
 
Module 1 TOC.pptx
Module 1 TOC.pptxModule 1 TOC.pptx
Module 1 TOC.pptx
 
Theory of computation Lec1
Theory of computation Lec1Theory of computation Lec1
Theory of computation Lec1
 
Algorithmics on SLP-compressed strings
Algorithmics on SLP-compressed stringsAlgorithmics on SLP-compressed strings
Algorithmics on SLP-compressed strings
 
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurEnd semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
 
draft
draftdraft
draft
 
Flat unit 1
Flat unit 1Flat unit 1
Flat unit 1
 
Formal language
Formal languageFormal language
Formal language
 
1 introduction
1 introduction1 introduction
1 introduction
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdf
 
Resumen material MIT
Resumen material MITResumen material MIT
Resumen material MIT
 
lec8.ppt
lec8.pptlec8.ppt
lec8.ppt
 
Chapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryChapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata Theory
 
Context free grammer.ppt
Context free grammer.pptContext free grammer.ppt
Context free grammer.ppt
 
Thoery of Computaion and Chomsky's Classification
Thoery of Computaion and Chomsky's ClassificationThoery of Computaion and Chomsky's Classification
Thoery of Computaion and Chomsky's Classification
 
QB104541.pdf
QB104541.pdfQB104541.pdf
QB104541.pdf
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptx
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 

Recently uploaded

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 

Recently uploaded (20)

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 

Algorithms on Strings

  • 1. Algorithms on Strings Michael Soltys CSU Channel Islands Computer Science February 4, 2015 Strings - Soltys Math/CS Seminar Title - 1/27
  • 2. String problems are at the heart of Computer Science: Rewriting systems are Turing complete In practice analysis of strings is central to: Algorithmic biology Text processing Language theory Coding theory Strings - Soltys Math/CS Seminar Introduction - 2/27
  • 3. Basics (COMP 454) An alphabet is a finite, non-empty set of distinct symbols, denoted usually by Σ. e.g., Σ = {0, 1} (binary alphabet) Σ = {a, b, c, . . . , z} (lower-case letters alphabet) A string, also called word, is a finite ordered sequence of symbols chosen from some alphabet. e.g., 010011101011 |w| denotes the length of the string w. e.g., |010011101011| = 12 The empty string, ε, |ε| = 0, is in any Σ by default. Strings - Soltys Math/CS Seminar Introduction - 3/27
  • 4. Σk is the set of strings over Σ of length exactly k. e.g., If Σ = {0, 1}, then Σ0 = {ε} Σ1 = Σ Σ2 = {00, 01, 10, 11}, etc. |Σk |? Kleene’s star Σ∗ is the set of all strings over Σ. Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ Σ3 ∪ . . . =Σ+ Concatenation If x, y are strings, and x = a1a2 . . . am & y = b1b2 . . . bn ⇒ x · y = xy juxtaposition = a1a2 . . . amb1b2 . . . bn UNIX cat command Strings - Soltys Math/CS Seminar Introduction - 4/27
  • 5. A language L is a collection of strings over some alphabet Σ, i.e., L ⊆ Σ∗. E.g., L = {ε, 01, 0011, 000111, . . .} = {0n 1n |n ≥ 0} (1) Note: wε = εw = w. {ε} = ∅; one is the language consisting of the single string ε, and the other is the empty language. Strings - Soltys Math/CS Seminar Introduction - 5/27
  • 6. Consider L = {w| w is of the form x01y ∈ Σ∗ } where Σ = {0, 1}. We want to specify a DFA A = (Q, Σ, δ, q0, F) that accepts all and only the strings in L. Σ = {0, 1}, Q = {q0, q1, q2}, and F = {q1}. Transition diagram q 1 0 0,1 10 q0 q2 1 Transition table 0 1 q0 q2 q0 q1 q1 q1 q2 q2 q1 Strings - Soltys Math/CS Seminar Introduction - 6/27
  • 7. A context-free grammar (CFG) is G = (V , T, P, S) — Variables, Terminals, Productions, Start variable Ex. P −→ ε|0|1|0P0|1P1. Ex. G = ({E, I}, T, P, E) where T = {+, ∗, (, ), a, b, 0, 1} and P is the following set of productions: E −→ I|E + E|E ∗ E|(E) I −→ a|b|Ia|Ib|I0|I1 If αAβ ∈ (V ∪ T)∗, A ∈ V , and A −→ γ is a production, then αAβ ⇒ αγβ. We use ∗ ⇒ to denote 0 or more steps. L(G) = {w ∈ T∗|S ∗ ⇒ w} Strings - Soltys Math/CS Seminar Introduction - 7/27
  • 8. Context-sensitive grammars (CSG) have rules of the form: α → β where α, β ∈ (T ∪ V )∗ and |α| ≤ |β|. A language is context sensitive if it has a CSG. Fact: It turns out that CSL = NTIME(n) A rewriting system (also called a Semi-Thue system) is a grammar where there are no restrictions; α → β for arbitrary α, β ∈ (V ∪ T)∗. Fact: It turns out that a rewriting system corresponds to the most general model of computation; i.e., a language has a rewriting system iff it is “computable.” Strings - Soltys Math/CS Seminar Introduction - 8/27
  • 9. A second course in Automata Chomsky-Schutzenberger Theorem: If L is a CFL, then there exists a regular language R, an n, and a homomorphism h, such that L = h(PARENn ∩ R). Parikh’s Theorem: If Σ = {a1, a2, . . . , an}, the signature of a string x ∈ Σ∗ is (#a1(x), #a2(x), . . . , #an(x)), i.e., the number of ocurrences of each symbol, in a fixed order. The signature of a language is defined by extension; regular and CFLs have the same signatures. Strings - Soltys Math/CS Seminar Introduction - 9/27
  • 10. This presentation is about algorithms on strings. Based on two papers that are coming out in the next months: Neerja Mhaskar and Michael Soltys Non-repetitive strings over alphabet lists to appear in WALCOM, February 2015. Neerja Mhaskar and Michael Soltys String Shuffle: Circuits and Graphs accepted in the Journal of Discrete Algorithms, 2015 Both at http://soltys.cs.csuci.edu (papers 3 & 19) Strings - Soltys Math/CS Seminar Introduction - 10/27
  • 11. Non-repetitive strings A word is non-repetitive if it does not contain a subword of the form vv. Word with repetition 010101110 Word without repetition 101 Easy observation: what is the smallest n so that any word over Σ = {0, 1} of length ≥ n has at least one repetition? Strings - Soltys Math/CS Seminar Non-repetitive strings - 11/27
  • 12. Original Thue problem For Σ3 = {1, 2, 3} and morphism, due to A. Thue: S =    1 → 12312 2 → 131232 3 → 1323132 Given a string w ∈ Σ∗ 3, we let S(w) denote w with every symbol replaced by its corresponding substitution: S(w) = S(w1w2 . . . wn) = S(w1)S(w2) . . . S(wn) Lemma: If w is non-repetitive then so is S(w). Strings - Soltys Math/CS Seminar Non-repetitive strings - 12/27
  • 13. Problem extended to alphabet lists List of alphabets L = L1, L2, . . . , Ln Can we generate non-repetitive words w = w1w2 . . . wn, such that the symbol wi ∈ Li ? Studied by: [GKM10], [Sha09], and it is a natural extension of the original problem posed and solved by A. Thue. E.g., L1 = {a, b, c}, L2 = {b, c, d}, L3 = {a, d, 2}, in this case w = ac2 is over L1, L2, L3 and non-repetitive. Is that true for any list where |Li | = 3 for all i? Strings - Soltys Math/CS Seminar Non-repetitive strings - 13/27
  • 14. [GKM10] shows that this can be done for |Li | = 4 for all i with this algorithm: pick any w1 ∈ L1 for i + 1 (w = w1w2 . . . wi is non-repetitive) pick a ∈ Li+1 if wa is non-repetitive, then let wi+1 = a if wa has a square vv, then vv must be a suffix delete the right copy of v from w, and restart. Using sophisticated Lov´asz Local Lemma argument and Catalan numbers we can show that the above algorithm succeeds with non-zero probability. Strings - Soltys Math/CS Seminar Non-repetitive strings - 14/27
  • 15. Particular “yes” cases for L1, L2, . . . , Ln Has a system of distinct representatives (SDR) Has the union property Can be mapped consistently to Σ3 = {1, 2, 3} It is a partition Strings - Soltys Math/CS Seminar Non-repetitive strings - 15/27
  • 16. Open Problem 1 Given any list L1, L2, . . . , Ln, where |Li | = 3, can we always find a non-repetitive string w over such a list? Strings - Soltys Math/CS Seminar Non-repetitive strings - 16/27
  • 17. Shuffle w is the shuffle of u, v: w = u v w = 0110110011101000 u = 01101110 v = 10101000 w = 0110110011101000 Strings - Soltys Math/CS Seminar Shuffle - 17/27
  • 18. Shuffle w is the shuffle of u, v: w = u v w = 0110110011101000 u = 01101110 v = 10101000 w = 0110110011101000 w is a shuffle of u and v provided: u = x1x2 · · · xk v = y1y2 · · · yk and w obtained by “interleaving” w = x1y1x2y2 · · · xkyk. Strings - Soltys Math/CS Seminar Shuffle - 17/27
  • 19. Square Shuffle w is a square provided it is equal to a shuffle of a u with itself, i.e., ∃u s.t. w = u u The string w = 0110110011101000 is a square: w = 0110110011101000 and u = 01101100 = 01101100 Strings - Soltys Math/CS Seminar Shuffle - 18/27
  • 20. Result from 2013 given an alphabet Σ, |Σ| ≥ 7, Square = {w : ∃u(w = u u)} is NP-complete. Strings - Soltys Math/CS Seminar Shuffle - 19/27
  • 21. Result from 2013 given an alphabet Σ, |Σ| ≥ 7, Square = {w : ∃u(w = u u)} is NP-complete. What we leave open: What about |Σ| = 2 (for |Σ| = 1, Square is just the set of even length strings) What about if |Σ| = ∞ but each symbol cannot occur more often than, say, 6 times (if each symbol occurs at most 4 times, Square can be reduced to 2-Sat – see P. Austrin Stack Exchange post http://bit.ly/WATco3) Strings - Soltys Math/CS Seminar Shuffle - 19/27
  • 22. Open Problem 2 Is Square NP-complete for alphabets of size {2, 3, 4, 5, 6} ? Strings - Soltys Math/CS Seminar Shuffle - 20/27
  • 23. Upper and lower bounds Shuffle(x, y, w) holds if and only if w is a shuffle of x, y Shuffle ∈ AC0 , but Shuffle ∈ AC1 . Strings - Soltys Math/CS Seminar Shuffle - 21/27
  • 24. Upper bound Strings - Soltys Math/CS Seminar Shuffle - 22/27
  • 25. Lower bound Parity(x) = 0 ≤ i ≤ |x| i is odd Shuffle(0|x|−i , 1i , x). Strings - Soltys Math/CS Seminar Shuffle - 23/27
  • 26. n−i i=1 i=3 i=5 i=n 0 x 1 1 10 0 0x x x1 ii n−i i in−i n−i Strings - Soltys Math/CS Seminar Shuffle - 24/27
  • 27. Open Problem 3 Is Shuffle in NC1 ? Strings - Soltys Math/CS Seminar Shuffle - 25/27
  • 28. Announcement of two upcoming seminars 1. February 16, 2015, 6:00-7:00pm Bell Tower 1471 Ryszard Janicki On Pairwise Comparisons Based Rankings 2. February 16, 2015, 7:00-8:00pm Bell Tower 1471 Neerja Mhaskar Repetition in Strings and String Shuffles Computer Science Seminars: http://compsci.csuci.edu/degrees/seminars.htm Strings - Soltys Math/CS Seminar Conclusion - 26/27
  • 29. References Jaroslaw Grytczuk, Jakub Kozik, and Pitor Micek. A new approach to nonrepetitive sequences. arXiv:1103.3809, December 2010. Jeffrey Shallit. A second course in formal languages and automata theory. Cambridge Univeristy Press, 2009. Strings - Soltys Math/CS Seminar References - 27/27