Skip to main content

Why does the PHP version run faster than MySQL


I have two very large tables to merge and so I have been trying to optomize the update for speed. I noticed that doing the update partially in PHP speeded it up significantly so I assumed this means I'm not be doing the MySQL properly.



I have simplified the problem to try and narrow it down ...



GRID_TABLE POSTCODE_TABLE
idNo, lat, lng, nearestPostcode postcode, lat, lng
________________________________ _____________________
1 57.1 -2.3 - AB12 3BA 56.3 -2.5
2 56.8 -1.9 - AB12 1YA 56.2 -2.3
. . . . . .

(200 entries) (35,000 entries)


I want to update the GRID_TABLE with the nearestPostcode from the POSTCODE_TABLE using latitude (lat) and longitude (lng) to find the nearest postcode to each grid point...




update grid_table set nearestPostcode = (
select postcode from postcode_table
where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
limit 1
)



The idea is that the 'where' clause speeds up the search by using indexes to narrow the set down to a few candidates and then the 'order by' clause finds the nearest one within this set.



This MySQL update takes 30 secs, but if I instead update each GRID_TABLE row individually in PHP it's over in the blink of an eye.




$queryStg = "select * from grid_table ;";
$sqlQuery1 = mysqli_query($mysqliLink, $queryStg);

while( $sqlRow = mysqli_fetch_assoc( $sqlQuery1 ) ) {

$idNo = $sqlRow['idNo'];
$lat = $sqlRow['lat'];
$lng = $sqlRow['lng'];

$queryStg = "
update grid_table
set nearestPostcode = (
SELECT postcode
FROM postcode_table
where
lat > " . ($lat - 0.0037) . " and
lat < " . ($lat + 0.0037) . " and
lng > " . ($lng - 0.0068) . " and
lng < " . ($lng + 0.0068) . "
ORDER BY
POW(lat - $lat, 2) +
POW((lng - $lng) * 0.546, 2)
ASC
limit 1
)
where idNo= $idNo;
";

$sqlQuery2 = mysqli_query($mysqliLink, $queryStg);

}



Surely the MySQL version should be faster than the PHP version?



Here is the MySQL for the tables...



CREATE TABLE `grid_table` (
`idNo` INT(11) NOT NULL AUTO_INCREMENT,
`lat` FLOAT(6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT(6,4) NOT NULL COMMENT 'longitude',
`nearestPostcode` CHAR(8) NOT NULL,
PRIMARY KEY (`idNo`),
INDEX `lat_lng` (`lat`, `lng`)
)
ENGINE=MyISAM
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=30047


CREATE TABLE `postcode_table` (
`postcode` CHAR(8) NOT NULL,
`lat` FLOAT(6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT(6,4) NOT NULL COMMENT 'longitude',
PRIMARY KEY (`postcode`),
INDEX `lat` (`lat`),
INDEX `lng` (`lng`),
INDEX `lat_lng` (`lat`, `lng`)
)
ENGINE=MyISAM
ROW_FORMAT=DEFAULT


MySQL import file is here... https://docs.google.com/leaf?id=0B93lksnTC7_cM2Y2ZDk1Y2YtMGQ3Yy00OTIxLTk0ZDAtZmE2NmQ3YTc1ZWRm&hl=en



(if you run the UPDATE, 10 nearestPostcodes will be added).



UPDATE AFTER ANSWERS...



I ran this...




explain extended
SELECT postcode FROM postcode_table
where lat > 57.0 and lat < 57.0074
and lng > -2.013 and lng < -2
ORDER BY POW(lat - 57.0, 2) + POW((lng - -2) * 0.546, 2) ASC



It returned...



id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,postcode_table,range,lat,lng,lat_lng,lat_lng,8,NULL,65,100.00,Using where; Using filesort


Removing the 'order by' caluse -> no difference in speed.



Simplifying the 'where' clause by removing 'lng', ie



where lat between grid_table.lat - 0.0037 and grid_table.lat + 0.0037
-> faster: 3 secs rather than 30 secs.


Using spatial column and index (see below) -> much slower (190 sec). Not sure if I implemented this correctly though.



ALTER TABLE `grid_table` ADD COLUMN `coords` POINT NOT NULL;
update grid_table set coords = POINT(lat, lng);
ALTER TABLE `grid_table` ADD SPATIAL INDEX `coords` (`coords`);

ALTER TABLE `postcode_table` ADD COLUMN `coords` POINT NOT NULL;
update postcode_table set coords = POINT(lat, lng);
ALTER TABLE `postcode_table` ADD SPATIAL INDEX `coords` (`coords`);

analyze table grid_table;
optimize table grid_table;
analyze table postcode_table;
optimize table postcode_table;


update grid_table set nearestPostcode = (
select postcode from postcode_table
WHERE MBRContains(GeomFromText(concat(
'POLYGON((',
grid_table.lat - 0.0037, ' ', grid_table.lng - 0.0068, ', ',
grid_table.lat - 0.0037, ' ', grid_table.lng + 0.0068, ', ',
grid_table.lat + 0.0037, ' ', grid_table.lng - 0.0068, ', ',
grid_table.lat - 0.0037, ' ', grid_table.lng - 0.0068,
'))')), postcode_table.coords)
order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
limit 1
)

Source: Tips4allCCNA FINAL EXAM

Comments

  1. In your MySQL version your subquery works with all 30000 grid_table records, whether in your PHP version -- it's only one. As you add where on outer table PK.

    I suggest you here to change update query. For example, try to make it without subquery, multiple-update as here http://dev.mysql.com/doc/refman/5.0/en/update.html.

    I believe it should help.

    Something like:

    update grid_table, postcode_table
    set grid_table.nearestPostcode = postcode_table.postcode
    where postcode_table.lat > grid_table.lat - 0.0037
    and postcode_table.lat < grid_table.lat + 0.0037
    and postcode_table.lng > grid_table.lng - 0.0068
    and lng < grid_table.lng + 0.0068
    group by grid_table.idNo
    having (POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)) = min(POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2))


    May be this version could help, but I`m not sure. I assume, the main root problem in your 1st version is subquery over all records.

    To have explain update, you can "convert" it to similar select:

    explain
    select
    *,
    (
    select postcode from postcode_table
    where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
    and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
    order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
    limit 1
    ) nearestPostcode
    from grid_table


    And you will see:

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY grid_table ALL 224
    2 DEPENDENT SUBQUERY postcode_table ALL lat,lng,lat_lng 35605 Using where; Using temporary; Using filesort


    But in case of idNo we have:

    explain
    select
    *,
    (
    select postcode from postcode_table
    where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
    and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
    order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
    limit 1
    ) nearestPostcode
    from grid_table
    where idNo = 1487;

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY grid_table const PRIMARY PRIMARY 4 const 1
    2 DEPENDENT SUBQUERY postcode_table range lat,lng,lat_lng lat 4 18 Using where; Using filesort


    So we have 35605 rows vs ~18*224 (~4000).

    To find correct query try to find good select 1st.

    Update

    Subquery isn't a root here :( So I think we should try some precalculated + indexed column may be. Target is to avoid order by SOMEFUNC()

    ReplyDelete
  2. Look at the execution plan to know what is taking so long. http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html

    ReplyDelete
  3. My guess is that the difference is due to your providing the value of $lat with in the row-by-row query, thereby saving large scans for the lookup here:-

    order by POW(lat - grid_table.lat,2)


    Like Mr47 says, you'll be able to see by EXPLAIN-ing the SQL statements.

    ReplyDelete

Post a Comment

Popular posts from this blog

[韓日関係] 首相含む大幅な内閣改造の可能性…早ければ来月10日ごろ=韓国

div not scrolling properly with slimScroll plugin

I am using the slimScroll plugin for jQuery by Piotr Rochala Which is a great plugin for nice scrollbars on most browsers but I am stuck because I am using it for a chat box and whenever the user appends new text to the boxit does scroll using the .scrollTop() method however the plugin's scrollbar doesnt scroll with it and when the user wants to look though the chat history it will start scrolling from near the top. I have made a quick demo of my situation http://jsfiddle.net/DY9CT/2/ Does anyone know how to solve this problem?

Why does this javascript based printing cause Safari to refresh the page?

The page I am working on has a javascript function executed to print parts of the page. For some reason, printing in Safari, causes the window to somehow update. I say somehow, because it does not really refresh as in reload the page, but rather it starts the "rendering" of the page from start, i.e. scroll to top, flash animations start from 0, and so forth. The effect is reproduced by this fiddle: http://jsfiddle.net/fYmnB/ Clicking the print button and finishing or cancelling a print in Safari causes the screen to "go white" for a sec, which in my real website manifests itself as something "like" a reload. While running print button with, let's say, Firefox, just opens and closes the print dialogue without affecting the fiddle page in any way. Is there something with my way of calling the browsers print method that causes this, or how can it be explained - and preferably, avoided? P.S.: On my real site the same occurs with Chrome. In the ex