Skip to main content

Why does the PHP version run faster than MySQL


I have two very large tables to merge and so I have been trying to optomize the update for speed. I noticed that doing the update partially in PHP speeded it up significantly so I assumed this means I'm not be doing the MySQL properly.



I have simplified the problem to try and narrow it down ...



GRID_TABLE POSTCODE_TABLE
idNo, lat, lng, nearestPostcode postcode, lat, lng
________________________________ _____________________
1 57.1 -2.3 - AB12 3BA 56.3 -2.5
2 56.8 -1.9 - AB12 1YA 56.2 -2.3
. . . . . .

(200 entries) (35,000 entries)


I want to update the GRID_TABLE with the nearestPostcode from the POSTCODE_TABLE using latitude (lat) and longitude (lng) to find the nearest postcode to each grid point...




update grid_table set nearestPostcode = (
select postcode from postcode_table
where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
limit 1
)



The idea is that the 'where' clause speeds up the search by using indexes to narrow the set down to a few candidates and then the 'order by' clause finds the nearest one within this set.



This MySQL update takes 30 secs, but if I instead update each GRID_TABLE row individually in PHP it's over in the blink of an eye.




$queryStg = "select * from grid_table ;";
$sqlQuery1 = mysqli_query($mysqliLink, $queryStg);

while( $sqlRow = mysqli_fetch_assoc( $sqlQuery1 ) ) {

$idNo = $sqlRow['idNo'];
$lat = $sqlRow['lat'];
$lng = $sqlRow['lng'];

$queryStg = "
update grid_table
set nearestPostcode = (
SELECT postcode
FROM postcode_table
where
lat > " . ($lat - 0.0037) . " and
lat < " . ($lat + 0.0037) . " and
lng > " . ($lng - 0.0068) . " and
lng < " . ($lng + 0.0068) . "
ORDER BY
POW(lat - $lat, 2) +
POW((lng - $lng) * 0.546, 2)
ASC
limit 1
)
where idNo= $idNo;
";

$sqlQuery2 = mysqli_query($mysqliLink, $queryStg);

}



Surely the MySQL version should be faster than the PHP version?



Here is the MySQL for the tables...



CREATE TABLE `grid_table` (
`idNo` INT(11) NOT NULL AUTO_INCREMENT,
`lat` FLOAT(6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT(6,4) NOT NULL COMMENT 'longitude',
`nearestPostcode` CHAR(8) NOT NULL,
PRIMARY KEY (`idNo`),
INDEX `lat_lng` (`lat`, `lng`)
)
ENGINE=MyISAM
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=30047


CREATE TABLE `postcode_table` (
`postcode` CHAR(8) NOT NULL,
`lat` FLOAT(6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT(6,4) NOT NULL COMMENT 'longitude',
PRIMARY KEY (`postcode`),
INDEX `lat` (`lat`),
INDEX `lng` (`lng`),
INDEX `lat_lng` (`lat`, `lng`)
)
ENGINE=MyISAM
ROW_FORMAT=DEFAULT


MySQL import file is here... https://docs.google.com/leaf?id=0B93lksnTC7_cM2Y2ZDk1Y2YtMGQ3Yy00OTIxLTk0ZDAtZmE2NmQ3YTc1ZWRm&hl=en



(if you run the UPDATE, 10 nearestPostcodes will be added).



UPDATE AFTER ANSWERS...



I ran this...




explain extended
SELECT postcode FROM postcode_table
where lat > 57.0 and lat < 57.0074
and lng > -2.013 and lng < -2
ORDER BY POW(lat - 57.0, 2) + POW((lng - -2) * 0.546, 2) ASC



It returned...



id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,postcode_table,range,lat,lng,lat_lng,lat_lng,8,NULL,65,100.00,Using where; Using filesort


Removing the 'order by' caluse -> no difference in speed.



Simplifying the 'where' clause by removing 'lng', ie



where lat between grid_table.lat - 0.0037 and grid_table.lat + 0.0037
-> faster: 3 secs rather than 30 secs.


Using spatial column and index (see below) -> much slower (190 sec). Not sure if I implemented this correctly though.



ALTER TABLE `grid_table` ADD COLUMN `coords` POINT NOT NULL;
update grid_table set coords = POINT(lat, lng);
ALTER TABLE `grid_table` ADD SPATIAL INDEX `coords` (`coords`);

ALTER TABLE `postcode_table` ADD COLUMN `coords` POINT NOT NULL;
update postcode_table set coords = POINT(lat, lng);
ALTER TABLE `postcode_table` ADD SPATIAL INDEX `coords` (`coords`);

analyze table grid_table;
optimize table grid_table;
analyze table postcode_table;
optimize table postcode_table;


update grid_table set nearestPostcode = (
select postcode from postcode_table
WHERE MBRContains(GeomFromText(concat(
'POLYGON((',
grid_table.lat - 0.0037, ' ', grid_table.lng - 0.0068, ', ',
grid_table.lat - 0.0037, ' ', grid_table.lng + 0.0068, ', ',
grid_table.lat + 0.0037, ' ', grid_table.lng - 0.0068, ', ',
grid_table.lat - 0.0037, ' ', grid_table.lng - 0.0068,
'))')), postcode_table.coords)
order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
limit 1
)

Source: Tips4allCCNA FINAL EXAM

Comments

  1. In your MySQL version your subquery works with all 30000 grid_table records, whether in your PHP version -- it's only one. As you add where on outer table PK.

    I suggest you here to change update query. For example, try to make it without subquery, multiple-update as here http://dev.mysql.com/doc/refman/5.0/en/update.html.

    I believe it should help.

    Something like:

    update grid_table, postcode_table
    set grid_table.nearestPostcode = postcode_table.postcode
    where postcode_table.lat > grid_table.lat - 0.0037
    and postcode_table.lat < grid_table.lat + 0.0037
    and postcode_table.lng > grid_table.lng - 0.0068
    and lng < grid_table.lng + 0.0068
    group by grid_table.idNo
    having (POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)) = min(POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2))


    May be this version could help, but I`m not sure. I assume, the main root problem in your 1st version is subquery over all records.

    To have explain update, you can "convert" it to similar select:

    explain
    select
    *,
    (
    select postcode from postcode_table
    where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
    and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
    order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
    limit 1
    ) nearestPostcode
    from grid_table


    And you will see:

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY grid_table ALL 224
    2 DEPENDENT SUBQUERY postcode_table ALL lat,lng,lat_lng 35605 Using where; Using temporary; Using filesort


    But in case of idNo we have:

    explain
    select
    *,
    (
    select postcode from postcode_table
    where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
    and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
    order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
    limit 1
    ) nearestPostcode
    from grid_table
    where idNo = 1487;

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY grid_table const PRIMARY PRIMARY 4 const 1
    2 DEPENDENT SUBQUERY postcode_table range lat,lng,lat_lng lat 4 18 Using where; Using filesort


    So we have 35605 rows vs ~18*224 (~4000).

    To find correct query try to find good select 1st.

    Update

    Subquery isn't a root here :( So I think we should try some precalculated + indexed column may be. Target is to avoid order by SOMEFUNC()

    ReplyDelete
  2. Look at the execution plan to know what is taking so long. http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html

    ReplyDelete
  3. My guess is that the difference is due to your providing the value of $lat with in the row-by-row query, thereby saving large scans for the lookup here:-

    order by POW(lat - grid_table.lat,2)


    Like Mr47 says, you'll be able to see by EXPLAIN-ing the SQL statements.

    ReplyDelete

Post a Comment

Popular posts from this blog

Why is this Javascript much *slower* than its jQuery equivalent?

I have a HTML list of about 500 items and a "filter" box above it. I started by using jQuery to filter the list when I typed a letter (timing code added later): $('#filter').keyup( function() { var jqStart = (new Date).getTime(); var search = $(this).val().toLowerCase(); var $list = $('ul.ablist > li'); $list.each( function() { if ( $(this).text().toLowerCase().indexOf(search) === -1 ) $(this).hide(); else $(this).show(); } ); console.log('Time: ' + ((new Date).getTime() - jqStart)); } ); However, there was a couple of seconds delay after typing each letter (particularly the first letter). So I thought it may be slightly quicker if I used plain Javascript (I read recently that jQuery's each function is particularly slow). Here's my JS equivalent: document.getElementById('filter').addEventListener( 'keyup', function () { var jsStart = (new Date).getTime()...

Is it possible to have IF statement in an Echo statement in PHP

Thanks in advance. I did look at the other questions/answers that were similar and didn't find exactly what I was looking for. I'm trying to do this, am I on the right path? echo " <div id='tabs-".$match."'> <textarea id='".$match."' name='".$match."'>". if ($COLUMN_NAME === $match) { echo $FIELD_WITH_COLUMN_NAME; } else { } ."</textarea> <script type='text/javascript'> CKEDITOR.replace( '".$match."' ); </script> </div>"; I am getting the following error message in the browser: Parse error: syntax error, unexpected T_IF Please let me know if this is the right way to go about nesting an IF statement inside an echo. Thank you.