Az előadás letöltése folymat van. Kérjük, várjon

Az előadás letöltése folymat van. Kérjük, várjon

1 Konjunktív lekérdezések, lekérdezések tartalmazási problémája.

Hasonló előadás


Az előadások a következő témára: "1 Konjunktív lekérdezések, lekérdezések tartalmazási problémája."— Előadás másolata:

1 1 Konjunktív lekérdezések, lekérdezések tartalmazási problémája

2 2 Konjunktív lekérdezések (CQ) Az első rendű logikai (FO) lekérdezések (relációs kalkulusok) részosztálya. Megfelel a SELECT-DISTINCT-FROM-WHERE SQL lekérdezéseknek. A gyakorlatban előforduló legtöbb lekérdezés ilyen. Az optimalizáló algoritmusok leggyakrabban konjunktív lekérdezésekre használhatók, ezért bontsuk fel a lekérdezéseket CQ részlekérdezésekre. CQ lekérdezések jól meghatározható, elméletileg bizonyítható tulajdonságokkal rendelkeznek.

3 3 Konjunktív lekérdezések definíciója Definíció: a konjunktív lekérdezésekhez tartozó DRC formulákat rekurzívan definiáljuk: ahol  és  ’ konjunktív lekérdezéshez tartozó formula. Vegyük észre, hogy nem szerepel: , ,  Lekérdezésen időnként a lekérdezéshez tartozó formulát értjük. Vegyük észre, hogy CQ  FO  ::= R(t 1,..., t ar(R) ) | t i = t j |    ’ |  x. 

4 4 Példák konjunktív lekérdezésekre Konjunktív lekérdezések (van-e valami kapcsolat köztük?) : Nem konjunktív lekérdezések (miért?): q(x,y) =  z.(R(x,z)   u.(R(z,u)  R(u,y))) q(x,y) =  z.  u.(R(x,z)  R(z,u)  R(u,y)) q(x,y) =  z.(R(x,z)   u.(R(z,u)  R(u,y))) q(x,y) =  z.  u.(R(x,z)  R(z,u)  R(u,y)) q(x,y) =  z.(R(x,z)  R(y,z)) q(x) = T(x)   z.S(x,z)

5 5 Konjunktív lekérdezések Tetszőleges CQ lekérdezés ekvivalens (átírható) ilyen alakú CQ lekérdezéssel: (Vagyis az összes kvantor a formula elején szerepel.) Ugyanez Datalog jelöléssel: q(x 1,...,x n ) =  y 1.  y 2...  y p.(R 1 (t 11,...,t 1m ) ...  R k (t k1,...,t km )) q(x 1,...,x n ) :- R 1 (t 11,...,t 1m ),..., R k (t k1,...,t km )) fej törzs (test) Datalog szabály

6 6 Példák a Datalog jelölésre Employee(x), ManagedBy(x,y), Manager(y) (Az adatbázisban dolgozókat, fönököket, és a „kinek ki a főnöke” relációt tároljuk.) Keressük meg az összes dolgozót, akinek ugyanaz a főnöke, mint “Smith”-nek: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

7 7 Példák a Datalog jelölésre Employee(x), ManagedBy(x,y), Manager(y) Az igazgató a fönök fönöke legyen. Keressük meg az összes dolgozót, akinek ugyanaz az igazgatója, mint “Smith”-nek: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)

8 8 A CQ és az SQL közti viszony select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) CQ: SQL: Fontos a “distinct”!

9 9 A CQ és az SQL közti viszony Véletlen volt, hogy át tudtuk írni, vagy a CQ lekérdezések pontosan a SELECT-DISTINCT-FROM-WHERE lekérdezéseknek felelnek meg?

10 10 CQ és a relációs algebra (RA) A CQ pontosan a  C,  A,  műveletekkel felírható relációs algebrai kifejezéseknek felel meg, (nem szerepel benne: , –)  $2.name  name=“Smith” ManagedBy $1.manager=$2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

11 11 A CQ kiterjesztései CQ  A(y) :- ManagedBy(x,y), ManagedBy(z,y), x  z Kik azok a főnökök, akiknek LEGALÁBB 2 beosztottjuk van? : a lekérdezésben használható nem egyenlő is

12 12 A CQ kiterjesztései CQ < A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u>v) Mely dolgozók keresnek többet mint a fönökük? : a lekérdezésben használható egyenlőtlenség is

13 13 Extensions of CQ CQ  A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x),  ManagedBy(x,y) Find people sharing the same office with Alice, but not the same manager: : a lekérdezésben használható tagadás is

14 14 Extensions of CQ UCQ A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address) Union of conjuctive queries Datalog notation is very convenient at expressing unions (no need for  ) Datalog program:

15 15 Extensions of CQ If we extend too much, we capture FO Theoreticians need to be careful: small extensions may make a huge difference on certain theoretical properties of CQ

16 16 Query Equivalence and Containment Justified by optimization needs Intensively studied since 1977

17 17 Query Equivalence Queries q 1 and q 2 are equivalent if for every database D, q 1 (D) = q 2 (D). Notation: q 1  q 2

18 18 Query Equivalence SELECT DISTINCT x.name, x.manager FROM Employee x, Employee y WHERE x.dept = ‘Sales’ and x.office = y.office and x.floor = 5 and y.dept = ‘Sales’ Hmmmm…. Is there a simple way to write that ?

19 19 Query Containment Query q 1 is contained in q 2 if for every database D, q 1 (D)  q 2 (D). q 1 and q 2 are equivalent if for every database D, q 1 (D) = q 2 (D) Notation: q 1  q 2, q 1  q 2 Obviously: q 1  q 2 and q 2  q 1 iff q 1  q 2 Conversely: q 1  q 2  q 1 iff q 1  q 2 We will study the containment problem only.

20 20 Examples of Query Containments q 1 (x) :- R(x,u), R(u,v), R(v,w) q 2 (x) :- R(x,u), R(u,v) q 1 (x) :- R(x,u), R(u,v), R(v,w) q 2 (x) :- R(x,u), R(u,v) Is q 1  q 2 ?

21 21 Examples of Query Containments q 1 (x) :- R(x,u), R(u,v), R(v,x) q 2 (x) :- R(x,u), R(u,x) q 1 (x) :- R(x,u), R(u,v), R(v,x) q 2 (x) :- R(x,u), R(u,x) Is q 1  q 2 ?

22 22 Examples of Query Containments q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) Is q 1  q 2 ?

23 23 Examples of Query Containments q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) Is q 1  q 2 ?

24 24 Query Containment Theorem Query containment for FO is undecidable Theorem Query containment for CQ is decidable and NP-complete.

25 25 Query Containment Algorithm How to check q 1  q 2 Canonical database for q 1 is: D q1 = (D, R 1 D, …, R k D ) –D = all variables and constants in q 1 –R 1 D, …, R k D = the body of q 1 Canonical tuple for q 1 is: t q1 (the head of q 1 )

26 26 Examples of Canonical Databases Canonical database: D q1 = (D, R D ) –D={x,y,u,v} –R D = Canonical tuple: t q1 = (x,y) xu vu vy q1(x,y) :- R(x,u),R(v,u),R(v,y)

27 27 Examples of Canonical Databases D q1 = (D, R) –D={x,u,”Smith”,”Fred”} –R = t q1 = (x) xu u“Smith” u“Fred” uu q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u)

28 28 Checking Containment Theorem: q 1  q 2 iff t q1  q 2 (D q1 ). Example: q 1 (x,y) :- R(x,u),R(v,u),R(v,y) q 2 (x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R =t q1 = (x,y) Yes, q 1  q 2 xu vu vy

29 29 Query Homomorphisms A homomorphism f : q 2  q 1 is a function f: var(q 2 )  var(q 1 )  const(q 1 ) such that: –f(body(q 2 ))  body(q 1 ) –f(t q1 ) = t q2 The Homomorphism Theorem q 1  q 2 iff there exists a homomorphism f : q 2  q 1

30 30 Example of Query Homeomorphism var(q 1 ) = {x, u, v, y} var(q 2 ) = {x, u, v, w, t, y} q 1 (x,y) :- R(x,u),R(v,u),R(v,y) q 2 (x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) Therefore q 1  q 2

31 31 Example of Query Homeomorphism var(q 1 )  const(q 1 ) = {x,u, “Smith”} var(q 2 ) = {x,u,v,w} q 1 (x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q 2 (x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) Therefore q 1  q 2

32 32 The Homeomorphism Theorem Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard Short: it is NP-complete

33 33 Query Containment for UCQ q 1  q 2  q 3 ....  q 1 ’  q 2 ’  q 3 ’ .... Notice: q 1  q 2  q 3 ....  q iff q 1  q and q 2  q and q 3  q and …. Theorem q  q 1 ’  q 2 ’  q 3 ’ .... Iff there exists some k such that q  q k ’ It follows that containment for UCQ is decidable, NP-complete.

34 34 Query Containment for CQ < q 1 () :- R(x,y), R(y,x) q 2 () :- R(x,y), x <= y q 1 () :- R(x,y), R(y,x) q 2 () :- R(x,y), x <= y q 1  q 2 although there is no homomorphism ! To check containment do this: -Consider all possible orderings of variables in q1 -For each of them check containment of q1 in q2 -If all hold, then q 1  q 2 Still decidable, but harder than NP: now in  p 2

35 35 Query Minimization Definition A conjunctive query q is minimal if for every other conjunctive query q’ s.t. q  q’, q’ has at least as many predicates (‘subgoals’) as q Are these queries minimal ? q(x) :- R(x,y), R(y,z), R(x,x) q(x) :- R(x,y), R(y,z), R(x,’Alice’)

36 36 Query Minimization Query minimization algorithm Choose a subgoal g of q Remove g: let q’ be the new query We already know q  q’ (why ?) If q’  q then permanently remove g Notice: the order in which we inspect subgoals doesn’t matter

37 37 Query Minimization In Practice No database system today performs minimization !!! Reason: –It’s hard (NP-complete) –Users don’t write non-minimal queries However, non-minimal queries arise when using views intensively

38 38 Query Minimization for Views CREATE VIEW HappyBoaters SELECT DISTINCT E1.name, E1.manager FROM Employee E1, Employee E2 WHERE E1.manager = E2.name and E1.boater=‘YES’ and E2.boater=‘YES’ This query is minimal

39 39 Query Minimization for Views SELECT DISTINCT H1.name FROM HappyBoaters H1, HappyBoaters H2 WHERE H1.manager = H2.name Now compute the Very-Happy-Boaters What happens in SQL when we run a query on a view ? This query is also minimal

40 40 Query Minimization for Views SELECT DISTINCT E1.name FROM Employee E1, Employee E2, Employee E3, Empolyee E4 WHERE E1.manager = E2.name and E1.boater = ‘YES’ and E2.boater = ‘YES’ and E3.manager = E4.name and E3.boater = ‘YES’ and E4.boater = ‘YES’ and E1.manager = E3.name View Expansion This query is no longer minimal ! E1 E3 E4 E2 E2 is redundant

41 41 Monotone Queries Definition A query q is monotone if: For every two databases D, D’ if D  D’ then q(D)  q(D’) Which queries below are monotone ?    x.R(x,x)    x.  y.  z.  u.(R(x,y)  R(y,z)  R(z,u))    x.  y.R(x,y)

42 42 Monotone Queries Theorem. Every conjunctive query is monotone Stronger: every UCQ query is monotone

43 43 How To Impress Your Students Or Your Boss Find all drinkers that like some beer that is not served by the bar “Black Cat” Can you write as a simple SELECT-FROM- WHERE (I.e. without a subquery) ? SELECT L.drinker FROM Likes L WHERE L.beer not in (SELECT S.beer FROM Serves S WHERE S.bar = ‘Black Cat’)

44 44 Expressive Power of FO The following queries cannot be expressed in FO: Transitive closure: –  x.  y. there exists x 1,..., x n s.t. R(x,x 1 )  R(x 1,x 2 ) ...  R(x n-1,x n )  R(x n,y) Parity: the number of edges in R is even

45 45 Datalog Adds recursion, so we can compute transitive closure A datalog program (query) consists of several datalog rules: P 1 (t 1 ) :- body 1 P 2 (t 2 ) :- body P n (t n ) :- body n

46 46 Datalog Terminology: EDB = extensional database predicates –The database predicates IDB = intentional database predicates –The new predicates constructed by the program

47 47 Datalog Employee(x), ManagedBy(x,y), Manager(y) HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) All higher level managers that are employees: EDBs IDBs

48 48 Datalog Employee(x), ManagedBy(x,y), Manager(y) Person(x) :- Manager(x) Person(x) :- Employee(x) Person(x) :- Manager(x) Person(x) :- Employee(x) All persons: Manger  Employee

49 49 Unfolding non-recursive rules Graph: R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) Can “unfold” it into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y)

50 50 Unfolding non-recursive rules Graph: R(x,y) P(x,y) :- R(x,y) P(x,y) :- R(x,u), R(u,y) A(x,y) :- P(x,y) P(x,y) :- R(x,y) P(x,y) :- R(x,u), R(u,y) A(x,y) :- P(x,y) Now the unfolding has a union: A(x,y) :- R(x,y)   u(R(x,u)  R(u,y))

51 51 Recursion in Datalog Graph: R(x,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) Transitive closure:

52 52 Recursion in Datalog Boolean trees: Leaf0(x), Leaf1(x), AND(x, y 1, y 2 ), OR(x, y 1, y 2 ), Root(x) Write a program that computes: Answer() :- true if the root node is 1

53 53 Recursion in Datalog One(x):- Leaf1(x) One(x) :- AND(x, y 1, y 2 ), One(y 1 ), One(y 2 ) One(x) :- OR(x, y 1, y 2 ), One(y 1 ) One(x) :- OR(x, y 1, y 2 ), One(y 2 ) Answer() :- Root(x), One(x) One(x):- Leaf1(x) One(x) :- AND(x, y 1, y 2 ), One(y 1 ), One(y 2 ) One(x) :- OR(x, y 1, y 2 ), One(y 1 ) One(x) :- OR(x, y 1, y 2 ), One(y 2 ) Answer() :- Root(x), One(x)

54 54 Exercise Boolean trees: Leaf0(x), Leaf1(x), AND(x, y 1, y 2 ), OR(x, y 1, y 2 ), Not(x,y), Root(x) Write a datalog program that computes the set of nodes that are “true” or “one”. Hint: compute both One(x) and Zero(x) here you need to use Leaf0

55 55 Variants of Datalog without recursionwith recursion without  Non-recursive Datalog = UCQ (why ?) Datalog with  Non-recursive Datalog  = FO Datalog 

56 56 Non-recursive Datalog Union of Conjunctive Queries = UCQ –Containment is decidable, and NP-complete Non-recursive Datalog –Is equivalent to UCQ –Hence containment is decidable here too –Is it still NP-complete ?

57 57 Non-recursive Datalog A non-recursive datalog: Its unfolding as a CQ: How big is this query ? T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) Anser(x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m, y)

58 58 Query Complexity Given a query  in FO And given a model D = (D, R 1 D, …, R k D ) What is the complexity of computing the answer  (D)

59 59 Query Complexity Vardi’s classification: Data Complexity: Fix . Compute  (D) as a function of |D| Query Complexity: Fix D. Compute  (D) as a function of |  | Combined Complexity: Compute  (D) as a function of |D| and |  | Which is the most important in databases ?

60 60 Example  (x)   u.(R(u,x)   y.(  v.S(y,v)   R(x,y))) R = S = How do we proceed ?

61 61 General Evaluation Algorithm for every subexpression  i of , (i = 1, …, m) compute the answer to  i as a table T i (x 1, …, x n ) return T m for every subexpression  i of , (i = 1, …, m) compute the answer to  i as a table T i (x 1, …, x n ) return T m Theorem. If  has k variables then one can compute  (D) in time O(|  |*|D| k ) Data Complexity = O(|D| k ) = in PTIME Query Complexity = O(|  |*c k ) = in EXPTIME

62 62 General Evaluation Algorithm Example:  1 (u,x)  R(u,x)  2 (y,v)  S(y,v)  3 (x,y)   R(x,y)  4 (y)   v.  2 (y,v)  5 (x,y)   4 (y)   3 (x,y)  6 (x)   y.  5 (x,y)  7 (u,x)   1 (u,x)   6 (x)  8 (x)   u.  7 (u,x)   (x)  (x)   u.(R(u,x)   y.(  v.S(y,v)   R(x,y)))

63 63 Complexity Theorem. If  has k variables then one can compute  (D) in time O(|  |*|D| k ) Remark. The number of variables matters !

64 64 Paying Attention to Variables Compute all chains of length m We used m+1 variables Can you rewrite it with fewer variables ? Chain m (x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m-1, y)

65 65 Counting Variables FO k = FO restricted to variables x 1,…, x k Write Chain m in FO 3 : Chain m (x,y) :-  u.R(x,u)  (  x.R(u, x)  (  u.R(x,u)…  (  u. R(u, y)…))

66 66 Query Complexity Note: it suffices to investigate boolean queries only –If non-boolean, do this: for a 1 in D, …, a k in D if (a 1, …, a k ) in  (D) /* this is a boolean query */ then output (a 1, …, a k ) for a 1 in D, …, a k in D if (a 1, …, a k ) in  (D) /* this is a boolean query */ then output (a 1, …, a k )

67 67 Computational Complexity Classes Recall computational complexity classes: AC 0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions

68 68 Data Complexity of Query Languages AC 0 PTIME PSPACE FO = non-rec datalog  FO(LFP) = datalog  FO(PFP) = datalog ,* Important: the more complex a QL, the harder it is to optimize Paper: On the Unusual Effectiveness of Logic in Computer Science

69 69 Views Employee(x), ManagedBy(x,y), Manager(y) L(x,y) :- ManagedBy(x,u), ManagedBy(u,y) Q(x,y) :- ManagedBy(x,u), ManagedBy(u,v), ManagedBy(v,w), ManagedBy(w,y), Employee(y) E(x,y) :- ManagedBy(x,y), Employee(y) How can we answer Q if we only have L and E ? Views Query

70 70 Views Query rewriting using views (when possible): Query answering: –Sometimes we cannot express it in CQ or FO, but we can still answer it Q(x,y) :- L(x,u), L(u,y), E(v,y)

71 71 Views Applications: Using advanced indexes Using replicated data Data integration [Ullman’99]

72 72 Finding a Rewriting Theorem Given views V 1, …, V n and query Q, the problem whether Q has a complete rewriting in terms of V 1, …, V n is NP complete

73 73 Certain Answers Sometimes we can’t answer, but we can get close V1(x,y) :- E(x,u), E(u,v), E(v,y) V2(x,y) :- E(x,u), E(u,y), Black(y) Q(x) :- E(x,u), E(u,v), E(v,w), E(w,s) Can’t really answer Q, but we can find approximations….

74 74 Certain Answers V1(x,y) :- E(x,u), E(u,v), E(v,y) V2(x,y) :- E(x,u), E(u,y), Black(y) Q(x) :- E(x,u), E(u,v), E(v,w), E(w,s) Q(x) :- V2(x,u), V2(u,v) Q(x) :- V1(x,u), V2(u,v) Q(x) :- V1(x,u), V1(u,v) All these return ‘certain’ answers…

75 75 Certain Answers Definition. Given V 1, …, V n, their answers A 1, …, A n and a query Q, a tuple t is a certain tuple for Q iff for every database instance D: if A 1 =V 1 (D) and … and A n =V n (D) then t  Q(D) if A 1  V 1 (D) and … and A n  V n (D) then t  Q(D) CWD (Closed World Assumption) OWD (Open World Assumption)

76 76 Computing Certain Answers Under OWD V1(x,y) :- E(x,u), E(u,v), E(v,y) V2(x,y) :- E(x,u), E(u,y), Black(y) Q(x) :- E(x,u), E(u,v), E(v,w), E(w,s) E(x,f(x,y)) :- V1(x,y) E(f(x,y),g(x,y)) :- V1(x,y) E(g(x,y),y) :- V1(x,y) E(x,h(x,y)) :- V2(x,y) E(h(x,y),y) :- V2(x,y) Black(y) :- V2(x,y) Inverse rules Combined datalog program

77 77 Computing Certain Answers Under OWD Next, we have two options Run the combined “datalog” program –It is actually a Prolog program –Notice: data complexity is PTIME Transform the datalog program first, so Q returns only values that are not Skolem Terms

78 78 Computing Answer Under CWD Is Different V1(x) :- R(x,u) V2(y) :- R(v,y) Q(x,y) :- R(x,y) A1 = {a} A2 = {b} Certain answers for Q under OWD: none Certain answers for Q under CWD: (a,b) Why ? Certain answers for Q under OWD: none Certain answers for Q under CWD: (a,b) Why ?


Letölteni ppt "1 Konjunktív lekérdezések, lekérdezések tartalmazási problémája."