Előadást letölteni
Az előadás letöltése folymat van. Kérjük, várjon
KiadtaZsigmond Boros Megváltozta több, mint 8 éve
1
1 Konjunktív lekérdezések, lekérdezések tartalmazási problémája
2
2 Konjunktív lekérdezések (CQ) Az első rendű logikai (FO) lekérdezések (relációs kalkulusok) részosztálya. Megfelel a SELECT-DISTINCT-FROM-WHERE SQL lekérdezéseknek. A gyakorlatban előforduló legtöbb lekérdezés ilyen. Az optimalizáló algoritmusok leggyakrabban konjunktív lekérdezésekre használhatók, ezért bontsuk fel a lekérdezéseket CQ részlekérdezésekre. CQ lekérdezések jól meghatározható, elméletileg bizonyítható tulajdonságokkal rendelkeznek.
3
3 Konjunktív lekérdezések definíciója Definíció: a konjunktív lekérdezésekhez tartozó DRC formulákat rekurzívan definiáljuk: ahol és ’ konjunktív lekérdezéshez tartozó formula. Vegyük észre, hogy nem szerepel: , , Lekérdezésen időnként a lekérdezéshez tartozó formulát értjük. Vegyük észre, hogy CQ FO ::= R(t 1,..., t ar(R) ) | t i = t j | ’ | x.
4
4 Példák konjunktív lekérdezésekre Konjunktív lekérdezések (van-e valami kapcsolat köztük?) : Nem konjunktív lekérdezések (miért?): q(x,y) = z.(R(x,z) u.(R(z,u) R(u,y))) q(x,y) = z. u.(R(x,z) R(z,u) R(u,y)) q(x,y) = z.(R(x,z) u.(R(z,u) R(u,y))) q(x,y) = z. u.(R(x,z) R(z,u) R(u,y)) q(x,y) = z.(R(x,z) R(y,z)) q(x) = T(x) z.S(x,z)
5
5 Konjunktív lekérdezések Tetszőleges CQ lekérdezés ekvivalens (átírható) ilyen alakú CQ lekérdezéssel: (Vagyis az összes kvantor a formula elején szerepel.) Ugyanez Datalog jelöléssel: q(x 1,...,x n ) = y 1. y 2... y p.(R 1 (t 11,...,t 1m ) ... R k (t k1,...,t km )) q(x 1,...,x n ) :- R 1 (t 11,...,t 1m ),..., R k (t k1,...,t km )) fej törzs (test) Datalog szabály
6
6 Példák a Datalog jelölésre Employee(x), ManagedBy(x,y), Manager(y) (Az adatbázisban dolgozókat, fönököket, és a „kinek ki a főnöke” relációt tároljuk.) Keressük meg az összes dolgozót, akinek ugyanaz a főnöke, mint “Smith”-nek: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
7
7 Példák a Datalog jelölésre Employee(x), ManagedBy(x,y), Manager(y) Az igazgató a fönök fönöke legyen. Keressük meg az összes dolgozót, akinek ugyanaz az igazgatója, mint “Smith”-nek: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)
8
8 A CQ és az SQL közti viszony select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) CQ: SQL: Fontos a “distinct”!
9
9 A CQ és az SQL közti viszony Véletlen volt, hogy át tudtuk írni, vagy a CQ lekérdezések pontosan a SELECT-DISTINCT-FROM-WHERE lekérdezéseknek felelnek meg?
10
10 CQ és a relációs algebra (RA) A CQ pontosan a C, A, műveletekkel felírható relációs algebrai kifejezéseknek felel meg, (nem szerepel benne: , –) $2.name name=“Smith” ManagedBy $1.manager=$2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
11
11 A CQ kiterjesztései CQ A(y) :- ManagedBy(x,y), ManagedBy(z,y), x z Kik azok a főnökök, akiknek LEGALÁBB 2 beosztottjuk van? : a lekérdezésben használható nem egyenlő is
12
12 A CQ kiterjesztései CQ < A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u>v) Mely dolgozók keresnek többet mint a fönökük? : a lekérdezésben használható egyenlőtlenség is
13
13 Extensions of CQ CQ A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x), ManagedBy(x,y) Find people sharing the same office with Alice, but not the same manager: : a lekérdezésben használható tagadás is
14
14 Extensions of CQ UCQ A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address) Union of conjuctive queries Datalog notation is very convenient at expressing unions (no need for ) Datalog program:
15
15 Extensions of CQ If we extend too much, we capture FO Theoreticians need to be careful: small extensions may make a huge difference on certain theoretical properties of CQ
16
16 Query Equivalence and Containment Justified by optimization needs Intensively studied since 1977
17
17 Query Equivalence Queries q 1 and q 2 are equivalent if for every database D, q 1 (D) = q 2 (D). Notation: q 1 q 2
18
18 Query Equivalence SELECT DISTINCT x.name, x.manager FROM Employee x, Employee y WHERE x.dept = ‘Sales’ and x.office = y.office and x.floor = 5 and y.dept = ‘Sales’ Hmmmm…. Is there a simple way to write that ?
19
19 Query Containment Query q 1 is contained in q 2 if for every database D, q 1 (D) q 2 (D). q 1 and q 2 are equivalent if for every database D, q 1 (D) = q 2 (D) Notation: q 1 q 2, q 1 q 2 Obviously: q 1 q 2 and q 2 q 1 iff q 1 q 2 Conversely: q 1 q 2 q 1 iff q 1 q 2 We will study the containment problem only.
20
20 Examples of Query Containments q 1 (x) :- R(x,u), R(u,v), R(v,w) q 2 (x) :- R(x,u), R(u,v) q 1 (x) :- R(x,u), R(u,v), R(v,w) q 2 (x) :- R(x,u), R(u,v) Is q 1 q 2 ?
21
21 Examples of Query Containments q 1 (x) :- R(x,u), R(u,v), R(v,x) q 2 (x) :- R(x,u), R(u,x) q 1 (x) :- R(x,u), R(u,v), R(v,x) q 2 (x) :- R(x,u), R(u,x) Is q 1 q 2 ?
22
22 Examples of Query Containments q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) Is q 1 q 2 ?
23
23 Examples of Query Containments q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) Is q 1 q 2 ?
24
24 Query Containment Theorem Query containment for FO is undecidable Theorem Query containment for CQ is decidable and NP-complete.
25
25 Query Containment Algorithm How to check q 1 q 2 Canonical database for q 1 is: D q1 = (D, R 1 D, …, R k D ) –D = all variables and constants in q 1 –R 1 D, …, R k D = the body of q 1 Canonical tuple for q 1 is: t q1 (the head of q 1 )
26
26 Examples of Canonical Databases Canonical database: D q1 = (D, R D ) –D={x,y,u,v} –R D = Canonical tuple: t q1 = (x,y) xu vu vy q1(x,y) :- R(x,u),R(v,u),R(v,y)
27
27 Examples of Canonical Databases D q1 = (D, R) –D={x,u,”Smith”,”Fred”} –R = t q1 = (x) xu u“Smith” u“Fred” uu q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u)
28
28 Checking Containment Theorem: q 1 q 2 iff t q1 q 2 (D q1 ). Example: q 1 (x,y) :- R(x,u),R(v,u),R(v,y) q 2 (x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R =t q1 = (x,y) Yes, q 1 q 2 xu vu vy
29
29 Query Homomorphisms A homomorphism f : q 2 q 1 is a function f: var(q 2 ) var(q 1 ) const(q 1 ) such that: –f(body(q 2 )) body(q 1 ) –f(t q1 ) = t q2 The Homomorphism Theorem q 1 q 2 iff there exists a homomorphism f : q 2 q 1
30
30 Example of Query Homeomorphism var(q 1 ) = {x, u, v, y} var(q 2 ) = {x, u, v, w, t, y} q 1 (x,y) :- R(x,u),R(v,u),R(v,y) q 2 (x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) Therefore q 1 q 2
31
31 Example of Query Homeomorphism var(q 1 ) const(q 1 ) = {x,u, “Smith”} var(q 2 ) = {x,u,v,w} q 1 (x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q 2 (x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) Therefore q 1 q 2
32
32 The Homeomorphism Theorem Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard Short: it is NP-complete
33
33 Query Containment for UCQ q 1 q 2 q 3 .... q 1 ’ q 2 ’ q 3 ’ .... Notice: q 1 q 2 q 3 .... q iff q 1 q and q 2 q and q 3 q and …. Theorem q q 1 ’ q 2 ’ q 3 ’ .... Iff there exists some k such that q q k ’ It follows that containment for UCQ is decidable, NP-complete.
34
34 Query Containment for CQ < q 1 () :- R(x,y), R(y,x) q 2 () :- R(x,y), x <= y q 1 () :- R(x,y), R(y,x) q 2 () :- R(x,y), x <= y q 1 q 2 although there is no homomorphism ! To check containment do this: -Consider all possible orderings of variables in q1 -For each of them check containment of q1 in q2 -If all hold, then q 1 q 2 Still decidable, but harder than NP: now in p 2
35
35 Query Minimization Definition A conjunctive query q is minimal if for every other conjunctive query q’ s.t. q q’, q’ has at least as many predicates (‘subgoals’) as q Are these queries minimal ? q(x) :- R(x,y), R(y,z), R(x,x) q(x) :- R(x,y), R(y,z), R(x,’Alice’)
36
36 Query Minimization Query minimization algorithm Choose a subgoal g of q Remove g: let q’ be the new query We already know q q’ (why ?) If q’ q then permanently remove g Notice: the order in which we inspect subgoals doesn’t matter
37
37 Query Minimization In Practice No database system today performs minimization !!! Reason: –It’s hard (NP-complete) –Users don’t write non-minimal queries However, non-minimal queries arise when using views intensively
38
38 Query Minimization for Views CREATE VIEW HappyBoaters SELECT DISTINCT E1.name, E1.manager FROM Employee E1, Employee E2 WHERE E1.manager = E2.name and E1.boater=‘YES’ and E2.boater=‘YES’ This query is minimal
39
39 Query Minimization for Views SELECT DISTINCT H1.name FROM HappyBoaters H1, HappyBoaters H2 WHERE H1.manager = H2.name Now compute the Very-Happy-Boaters What happens in SQL when we run a query on a view ? This query is also minimal
40
40 Query Minimization for Views SELECT DISTINCT E1.name FROM Employee E1, Employee E2, Employee E3, Empolyee E4 WHERE E1.manager = E2.name and E1.boater = ‘YES’ and E2.boater = ‘YES’ and E3.manager = E4.name and E3.boater = ‘YES’ and E4.boater = ‘YES’ and E1.manager = E3.name View Expansion This query is no longer minimal ! E1 E3 E4 E2 E2 is redundant
41
41 Monotone Queries Definition A query q is monotone if: For every two databases D, D’ if D D’ then q(D) q(D’) Which queries below are monotone ? x.R(x,x) x. y. z. u.(R(x,y) R(y,z) R(z,u)) x. y.R(x,y)
42
42 Monotone Queries Theorem. Every conjunctive query is monotone Stronger: every UCQ query is monotone
43
43 How To Impress Your Students Or Your Boss Find all drinkers that like some beer that is not served by the bar “Black Cat” Can you write as a simple SELECT-FROM- WHERE (I.e. without a subquery) ? SELECT L.drinker FROM Likes L WHERE L.beer not in (SELECT S.beer FROM Serves S WHERE S.bar = ‘Black Cat’)
44
44 Expressive Power of FO The following queries cannot be expressed in FO: Transitive closure: – x. y. there exists x 1,..., x n s.t. R(x,x 1 ) R(x 1,x 2 ) ... R(x n-1,x n ) R(x n,y) Parity: the number of edges in R is even
45
45 Datalog Adds recursion, so we can compute transitive closure A datalog program (query) consists of several datalog rules: P 1 (t 1 ) :- body 1 P 2 (t 2 ) :- body 2.... P n (t n ) :- body n
46
46 Datalog Terminology: EDB = extensional database predicates –The database predicates IDB = intentional database predicates –The new predicates constructed by the program
47
47 Datalog Employee(x), ManagedBy(x,y), Manager(y) HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) All higher level managers that are employees: EDBs IDBs
48
48 Datalog Employee(x), ManagedBy(x,y), Manager(y) Person(x) :- Manager(x) Person(x) :- Employee(x) Person(x) :- Manager(x) Person(x) :- Employee(x) All persons: Manger Employee
49
49 Unfolding non-recursive rules Graph: R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) Can “unfold” it into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y)
50
50 Unfolding non-recursive rules Graph: R(x,y) P(x,y) :- R(x,y) P(x,y) :- R(x,u), R(u,y) A(x,y) :- P(x,y) P(x,y) :- R(x,y) P(x,y) :- R(x,u), R(u,y) A(x,y) :- P(x,y) Now the unfolding has a union: A(x,y) :- R(x,y) u(R(x,u) R(u,y))
51
51 Recursion in Datalog Graph: R(x,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) Transitive closure:
52
52 Recursion in Datalog Boolean trees: Leaf0(x), Leaf1(x), AND(x, y 1, y 2 ), OR(x, y 1, y 2 ), Root(x) Write a program that computes: Answer() :- true if the root node is 1
53
53 Recursion in Datalog One(x):- Leaf1(x) One(x) :- AND(x, y 1, y 2 ), One(y 1 ), One(y 2 ) One(x) :- OR(x, y 1, y 2 ), One(y 1 ) One(x) :- OR(x, y 1, y 2 ), One(y 2 ) Answer() :- Root(x), One(x) One(x):- Leaf1(x) One(x) :- AND(x, y 1, y 2 ), One(y 1 ), One(y 2 ) One(x) :- OR(x, y 1, y 2 ), One(y 1 ) One(x) :- OR(x, y 1, y 2 ), One(y 2 ) Answer() :- Root(x), One(x)
54
54 Exercise Boolean trees: Leaf0(x), Leaf1(x), AND(x, y 1, y 2 ), OR(x, y 1, y 2 ), Not(x,y), Root(x) Write a datalog program that computes the set of nodes that are “true” or “one”. Hint: compute both One(x) and Zero(x) here you need to use Leaf0
55
55 Variants of Datalog without recursionwith recursion without Non-recursive Datalog = UCQ (why ?) Datalog with Non-recursive Datalog = FO Datalog
56
56 Non-recursive Datalog Union of Conjunctive Queries = UCQ –Containment is decidable, and NP-complete Non-recursive Datalog –Is equivalent to UCQ –Hence containment is decidable here too –Is it still NP-complete ?
57
57 Non-recursive Datalog A non-recursive datalog: Its unfolding as a CQ: How big is this query ? T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) Anser(x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m, y)
58
58 Query Complexity Given a query in FO And given a model D = (D, R 1 D, …, R k D ) What is the complexity of computing the answer (D)
59
59 Query Complexity Vardi’s classification: Data Complexity: Fix . Compute (D) as a function of |D| Query Complexity: Fix D. Compute (D) as a function of | | Combined Complexity: Compute (D) as a function of |D| and | | Which is the most important in databases ?
60
60 Example (x) u.(R(u,x) y.( v.S(y,v) R(x,y))) 38 75 08 097 69 76 898 987 40 434 558 86 979 667 47 68 R = S = How do we proceed ?
61
61 General Evaluation Algorithm for every subexpression i of , (i = 1, …, m) compute the answer to i as a table T i (x 1, …, x n ) return T m for every subexpression i of , (i = 1, …, m) compute the answer to i as a table T i (x 1, …, x n ) return T m Theorem. If has k variables then one can compute (D) in time O(| |*|D| k ) Data Complexity = O(|D| k ) = in PTIME Query Complexity = O(| |*c k ) = in EXPTIME
62
62 General Evaluation Algorithm Example: 1 (u,x) R(u,x) 2 (y,v) S(y,v) 3 (x,y) R(x,y) 4 (y) v. 2 (y,v) 5 (x,y) 4 (y) 3 (x,y) 6 (x) y. 5 (x,y) 7 (u,x) 1 (u,x) 6 (x) 8 (x) u. 7 (u,x) (x) (x) u.(R(u,x) y.( v.S(y,v) R(x,y)))
63
63 Complexity Theorem. If has k variables then one can compute (D) in time O(| |*|D| k ) Remark. The number of variables matters !
64
64 Paying Attention to Variables Compute all chains of length m We used m+1 variables Can you rewrite it with fewer variables ? Chain m (x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m-1, y)
65
65 Counting Variables FO k = FO restricted to variables x 1,…, x k Write Chain m in FO 3 : Chain m (x,y) :- u.R(x,u) ( x.R(u, x) ( u.R(x,u)… ( u. R(u, y)…))
66
66 Query Complexity Note: it suffices to investigate boolean queries only –If non-boolean, do this: for a 1 in D, …, a k in D if (a 1, …, a k ) in (D) /* this is a boolean query */ then output (a 1, …, a k ) for a 1 in D, …, a k in D if (a 1, …, a k ) in (D) /* this is a boolean query */ then output (a 1, …, a k )
67
67 Computational Complexity Classes Recall computational complexity classes: AC 0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions
68
68 Data Complexity of Query Languages AC 0 PTIME PSPACE FO = non-rec datalog FO(LFP) = datalog FO(PFP) = datalog ,* Important: the more complex a QL, the harder it is to optimize Paper: On the Unusual Effectiveness of Logic in Computer Science
69
69 Views Employee(x), ManagedBy(x,y), Manager(y) L(x,y) :- ManagedBy(x,u), ManagedBy(u,y) Q(x,y) :- ManagedBy(x,u), ManagedBy(u,v), ManagedBy(v,w), ManagedBy(w,y), Employee(y) E(x,y) :- ManagedBy(x,y), Employee(y) How can we answer Q if we only have L and E ? Views Query
70
70 Views Query rewriting using views (when possible): Query answering: –Sometimes we cannot express it in CQ or FO, but we can still answer it Q(x,y) :- L(x,u), L(u,y), E(v,y)
71
71 Views Applications: Using advanced indexes Using replicated data Data integration [Ullman’99]
72
72 Finding a Rewriting Theorem Given views V 1, …, V n and query Q, the problem whether Q has a complete rewriting in terms of V 1, …, V n is NP complete
73
73 Certain Answers Sometimes we can’t answer, but we can get close V1(x,y) :- E(x,u), E(u,v), E(v,y) V2(x,y) :- E(x,u), E(u,y), Black(y) Q(x) :- E(x,u), E(u,v), E(v,w), E(w,s) Can’t really answer Q, but we can find approximations….
74
74 Certain Answers V1(x,y) :- E(x,u), E(u,v), E(v,y) V2(x,y) :- E(x,u), E(u,y), Black(y) Q(x) :- E(x,u), E(u,v), E(v,w), E(w,s) Q(x) :- V2(x,u), V2(u,v) Q(x) :- V1(x,u), V2(u,v) Q(x) :- V1(x,u), V1(u,v) All these return ‘certain’ answers…
75
75 Certain Answers Definition. Given V 1, …, V n, their answers A 1, …, A n and a query Q, a tuple t is a certain tuple for Q iff for every database instance D: if A 1 =V 1 (D) and … and A n =V n (D) then t Q(D) if A 1 V 1 (D) and … and A n V n (D) then t Q(D) CWD (Closed World Assumption) OWD (Open World Assumption)
76
76 Computing Certain Answers Under OWD V1(x,y) :- E(x,u), E(u,v), E(v,y) V2(x,y) :- E(x,u), E(u,y), Black(y) Q(x) :- E(x,u), E(u,v), E(v,w), E(w,s) E(x,f(x,y)) :- V1(x,y) E(f(x,y),g(x,y)) :- V1(x,y) E(g(x,y),y) :- V1(x,y) E(x,h(x,y)) :- V2(x,y) E(h(x,y),y) :- V2(x,y) Black(y) :- V2(x,y) Inverse rules Combined datalog program
77
77 Computing Certain Answers Under OWD Next, we have two options Run the combined “datalog” program –It is actually a Prolog program –Notice: data complexity is PTIME Transform the datalog program first, so Q returns only values that are not Skolem Terms
78
78 Computing Answer Under CWD Is Different V1(x) :- R(x,u) V2(y) :- R(v,y) Q(x,y) :- R(x,y) A1 = {a} A2 = {b} Certain answers for Q under OWD: none Certain answers for Q under CWD: (a,b) Why ? Certain answers for Q under OWD: none Certain answers for Q under CWD: (a,b) Why ?
Hasonló előadás
© 2024 SlidePlayer.hu Inc.
All rights reserved.