Or better said when Sphinx can help MySQL on queries that at first look they don’t involve any fulltext searching.
Sphinx was build in mind to help the DB on fulltext queries. But it can also help on where there is no text search. That is everyday used queries with combined filtering,grouping and sorting used for various analytics, reporting of simply general usage.
In Sphinx, the fulltext query is executed first, creating a result set that is passed to the remaining operations ( filters, groups, sorts). By reducing the size of the set that is interogated, the whole query will not be only faster, but it will consume less resources.
Because of design for speed, Sphinx can group and sort a lot faster and can do easy segmentations or getting top-N best group matches in a single query.
The result will be offloading heavy work done by database nodes to even a single Sphinx server.
Slides were presented at PerconaLive London 2013
4. What is Sphinx
•
•
•
•
free, open-source, search server
fast
700 qps /core / 1M docs
flexible
100+ features
scalable
o
o
300 mil. q / day
50 TB data, 100+ boxes
6. Meet SphinxQL
MySQL language
MySQL
MySQL connector
MySQL protocol
Application
SELECT * FROM mytable WHERE ...
SphinxQL language
MySQL connector
MySQL protocol
Sphinx
8. What Sphinx can do beside fulltext?
•
•
•
•
•
usual WHERE, ORDER, GROUP BY
GROUP BY custom extensions:
o WITHIN GROUP ORDER BY
o GROUP <N> BY
Aggregation, timestamp,math functions
Comparasion functions: IF(), INTERVAL(), IN()
Geo spatial: GEODIST(), GEOPOLY2D()
9. WITHIN GROUP ORDER BY
mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY
price ASC ORDER BY brand_id ASC;
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| id
| price | brand_id | property2 | added
| title
| brand_name | property | today |
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| 520157 |
10 |
1 |
160 | 1382745486 | Product Nine Seven | brand1
| Three
|
26 |
| 1726473 |
10 |
2 |
160 | 1382796463 | Product Two Three
| brand2
| Eight
|
26 |
| 1588875 |
11 |
3 |
160 | 1382762264 | Product Three Six
| brand3
| Five
|
26 |
| 1556197 |
10 |
4 |
160 | 1382754018 | Product Eight Six
| brand4
| Seven
|
26 |
| 751443 |
11 |
5 |
160 | 1382803444 | Product Six Three
| brand5
| One
|
26 |
| 512776 |
11 |
6 |
160 | 1382743642 | Product Ten Five
| brand6
| Six
|
26 |
mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER
BY price DESC ORDER BY brand_id ASC;
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| id
| price | brand_id | property2 | added
| title
| brand_name | property | today |
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| 815154 |
998 |
1 |
160 | 1382819286 | Product Two Nine
| brand1
| Eight
|
26 |
| 2793903 |
999 |
2 |
160 | 1382813601 | Product Eight Five | brand2
| Two
|
26 |
| 699831 | 1000 |
3 |
160 | 1382790589 | Product One Six
| brand3
| Eight
|
26 |
| 714052 | 1000 |
4 |
160 | 1382794137 | Product One Ten
| brand4
| Three
|
26 |
| 2791902 |
999 |
5 |
160 | 1382813140 | Product Five Three | brand5
| Four
|
26 |
| 2753725 | 1000 |
6 |
160 | 1382803662 | Product Seven Three | brand6
| Two
|
26 |
10. Using GROUP <N> BY
mysql>
SELECT * FROM facetdemo GROUP 3 BY brand_id WITHIN GROUP ORDER BY added DESC ORDER BY brand_id ASC;
+---------+-------+----------+------------+---------------------+------------+----------+
| id
| price | brand_id | added
| title
| brand_name | property |
+---------+-------+----------+------------+---------------------+------------+----------+
| 1479848 |
938 |
1 | 1382735889 | Product Ten Seven
| brand1
| Four
| 2479064 |
398 |
1 | 1382734998 | Product Ten Five
| brand1
| Eight
|
|
| 1480553 |
687 |
1 | 1382734048 | Product Four Two
| brand1
| One
|
| 1479580 |
62 |
2 | 1382734834 | Product Nine Seven
| brand2
| Ten
|
| 1479585 |
357 |
2 | 1382734834 | Product Six Two
| brand2
| Five
|
|
477383 |
908 |
2 | 1382733871 | Product Ten Three
| brand2
| Eight
|
| 2478429 |
425 |
3 | 1382734839 | Product Three Ten
| brand3
| Five
|
|
477456 |
519 |
3 | 1382734818 | Product Ten One
| brand3
| Six
|
|
477521 |
190 |
3 | 1382734403 | Product Three Two
| brand3
| Five
|
| 2478459 |
931 |
4 | 1382734850 | Product One Two
| brand4
| Five
|
| 1479718 |
891 |
4 | 1382734065 | Product Two One
| brand4
| Three
|
| 2478514 |
106 |
4 | 1382733868 | Product Six Seven
| brand4
| One
|
|
477297 |
991 |
5 | 1382734844 | Product Five Eight
| brand5
| Four
|
| 2479053 |
648 |
5 | 1382733994 | Product Six One
| brand5
| Nine
|
| 1480798 |
250 |
5 | 1382732121 | Product One Seven
| brand5
| Eight
|
11. Using HAVING
mysql> SELECT *,COUNT(*) FROM facetdemo where property2 = 190 and price>900 GROUP BY brand_id HAVING COUNT(*)>1000;
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
| id
| price | brand_id | property2 | added
| title
| brand_name | property | count(*) |
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
|
2566 |
934 |
24 |
190 | 1382615816 | Product One Three | brand24
| Six
|
1023 |
|
4807 |
905 |
11 |
190 | 1382616392 | Product Five Six
| brand11
| Eight
|
1023 |
|
5539 |
985 |
44 |
190 | 1382616552 | Product Ten Four
| brand44
| Three
|
1009 |
|
7655 |
912 |
10 |
190 | 1382617104 | Product Four Five | brand10
| Ten
|
1028 |
| 16837 |
968 |
20 |
190 | 1382619365 | Product One Nine
| Five
|
1015 |
| brand20
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
5 rows in set (0.17 sec)
12. Comparing simple queries
Operation
Example
MySQL
Sphinx
difference
Filter by integer, group by
integer
WHERE property_int =190
GROUP BY brand_id
0.32
0.14
2.2x
Group by integer, order by
count(*)
GROUP BY brand_id ORDER BY
COUNT(*) DESC
1.76
0.53
3.3x
Filter by integer, order by
timestamp
WHERE brand_id=20 ORDER BY
added ASC
0.00
0.14
0
Filter by integer, order by
timestamp and integer
column
WHERE brand_id=20 ORDER BY
added DESC, property_int ASC
0.31
0.19
1.5x
13. Using IF comparasion
mysql> SELECT COUNT(*), IF( property2=270 OR price<80, 1,
IF(property2=280 OR price> 900,2,3)
) AS expr
FROM facetdemo GROUP BY expr;
+----------+------+
| count(*) | expr |
+----------+------+
|
7494455 |
3 |
|
1357178 |
2 |
|
1148366 |
1 |
+----------+------+
3 rows in set (1.04 sec)
14. Using INTERVAL for segmentation
mysql> SELECT id, price, INTERVAL(price,0,300,600,900) AS pricerange, COUNT(*) FROM facetdemo WHERE
brand_id=27 GROUP BY pricerange ORDER BY pricerange ASC;
+------+-------+------------+----------+
| id
| price | pricerange | count(*) |
+------+-------+------------+----------+
|
219 |
196 |
1 |
58283 |
|
46 |
467 |
2 |
60535 |
|
109 |
667 |
3 |
60789 |
|
5 |
962 |
4 |
20285 |
+------+-------+------------+----------+
4 rows in set (0.19 sec)
15. Geo spatial in Sphinx
GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })
o in
{ deg | degrees | rad | radians}
o out
{m | meters | km | ft | mi | miles }
o method
{haversine | adaptive}
haversine - high precision, expensive
adaptive - good precision, cheaper
(Polar flat-Earth algorithm )
16. • POLY2D(x1,y1,x2,y2,x3,y3, …)
• GEOPOLY2D (lat1,lng1,lat2,lng2,lat3,lng3,...)
•
lat/lng in degrees
• CONTAINTS( polygon, x, y )
mysql> SELECT *, CONTAINS(GEOPOLY2D(40.95164274496,-76.88583678218,41.188446201688,73.203723511772,39.900666261352,-74.171833538046,40.059260979044,76.301076056469),latitude_deg,longitude_deg) AS inside FROM geodemo WHERE inside=1
LIMIT 0,100 ;
21. Going further: JSON
•
•
starting with 2.1 Sphinx supports JSON
documents
useful for
{
o
o
"id": 1,
"gid": 2,
"title": "some title",
"tags":
[ "tag1", "tag2", "tag3" ],
"property": [
{
"name": "color",
"value": "blue"
},
{
"name": "weight",
"value": 2.56
}
]
unstructured data
complex one to many relations
}
22. JSON attributes
•
•
•
filter, sort and group
JSON/MVA array functions:
LENGTH(), LEAST(), GREATEST()
Advanced JSON search in array of objects:
ANY(), ALL(), INDEXOF()
23. Advanced searching in JSON
document :
SELECT *,ANY (
id : 1011
title : Hotel Sky
myjson: {
…
offers: {
( item.type = 1 AND
item.start > my_start_timestamp AND
item.end < my_end_timestamp )
{
‘type’ : 3,
‘start’ : start_timestamp,
‘end’: end_timestamp
},
{
‘type’ : 1,
‘start’ : start_timestamp,
‘end’: end_timestamp
}
…
}
}
FOR item IN myjson.offers
) AS condition
FROM index
WHERE condition =1
24. •
•
•
ANY ( cond FOR var IN json.array)
true if one element match condition
ALL ( cond FOR var IN json.array)
o true if all elements match condition
INDEXOF ( cond FOR var IN json.array)
o returns index key of first element that match
condition
o
26. SELECT *,(...) as heavy_expr
WHERE attr=x AND heavy_expr =1
No fulltext match, query does fullscan,
computes for whole collection the heavy
expression
SELECT *,(...) as heavy_expr
WHERE MATCH(‘attrx’) AND heavy_expr =1
Fulltext match, heavy expression is
computed only on result set returned by
fulltext match
27. Sphinx with FT filter
Operation
Example
MySQL
Sphinx w/o FT
Filter by integer,
order by
timestamp and
integer column
WHERE
brand_id=20
ORDER BY added
DESC, property_int
ASC
0.31
0.19
Fulltext filter,
order by
timestamp and
integer column
WHERE
MATCH(‘brand20’)
ORDER BY added
DESC, property_int
ASC
Sphinx with FT
0.13
28. Speed up geo spatial with fulltext
•
example: find items around a point in New York city in a
10km radius. Speed-up: search only items belonging to
New York states
mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE
distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.17 sec)
mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE
MATCH('@state_code NY')
AND distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.03 sec)