Skip to main content

Why does the PHP version run faster than MySQL


I have two very large tables to merge and so I have been trying to optomize the update for speed. I noticed that doing the update partially in PHP speeded it up significantly so I assumed this means I'm not be doing the MySQL properly.



I have simplified the problem to try and narrow it down ...



GRID_TABLE POSTCODE_TABLE
idNo, lat, lng, nearestPostcode postcode, lat, lng
________________________________ _____________________
1 57.1 -2.3 - AB12 3BA 56.3 -2.5
2 56.8 -1.9 - AB12 1YA 56.2 -2.3
. . . . . .

(200 entries) (35,000 entries)


I want to update the GRID_TABLE with the nearestPostcode from the POSTCODE_TABLE using latitude (lat) and longitude (lng) to find the nearest postcode to each grid point...




update grid_table set nearestPostcode = (
select postcode from postcode_table
where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
limit 1
)



The idea is that the 'where' clause speeds up the search by using indexes to narrow the set down to a few candidates and then the 'order by' clause finds the nearest one within this set.



This MySQL update takes 30 secs, but if I instead update each GRID_TABLE row individually in PHP it's over in the blink of an eye.




$queryStg = "select * from grid_table ;";
$sqlQuery1 = mysqli_query($mysqliLink, $queryStg);

while( $sqlRow = mysqli_fetch_assoc( $sqlQuery1 ) ) {

$idNo = $sqlRow['idNo'];
$lat = $sqlRow['lat'];
$lng = $sqlRow['lng'];

$queryStg = "
update grid_table
set nearestPostcode = (
SELECT postcode
FROM postcode_table
where
lat > " . ($lat - 0.0037) . " and
lat < " . ($lat + 0.0037) . " and
lng > " . ($lng - 0.0068) . " and
lng < " . ($lng + 0.0068) . "
ORDER BY
POW(lat - $lat, 2) +
POW((lng - $lng) * 0.546, 2)
ASC
limit 1
)
where idNo= $idNo;
";

$sqlQuery2 = mysqli_query($mysqliLink, $queryStg);

}



Surely the MySQL version should be faster than the PHP version?



Here is the MySQL for the tables...



CREATE TABLE `grid_table` (
`idNo` INT(11) NOT NULL AUTO_INCREMENT,
`lat` FLOAT(6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT(6,4) NOT NULL COMMENT 'longitude',
`nearestPostcode` CHAR(8) NOT NULL,
PRIMARY KEY (`idNo`),
INDEX `lat_lng` (`lat`, `lng`)
)
ENGINE=MyISAM
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=30047


CREATE TABLE `postcode_table` (
`postcode` CHAR(8) NOT NULL,
`lat` FLOAT(6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT(6,4) NOT NULL COMMENT 'longitude',
PRIMARY KEY (`postcode`),
INDEX `lat` (`lat`),
INDEX `lng` (`lng`),
INDEX `lat_lng` (`lat`, `lng`)
)
ENGINE=MyISAM
ROW_FORMAT=DEFAULT


MySQL import file is here... https://docs.google.com/leaf?id=0B93lksnTC7_cM2Y2ZDk1Y2YtMGQ3Yy00OTIxLTk0ZDAtZmE2NmQ3YTc1ZWRm&hl=en



(if you run the UPDATE, 10 nearestPostcodes will be added).



UPDATE AFTER ANSWERS...



I ran this...




explain extended
SELECT postcode FROM postcode_table
where lat > 57.0 and lat < 57.0074
and lng > -2.013 and lng < -2
ORDER BY POW(lat - 57.0, 2) + POW((lng - -2) * 0.546, 2) ASC



It returned...



id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,postcode_table,range,lat,lng,lat_lng,lat_lng,8,NULL,65,100.00,Using where; Using filesort


Removing the 'order by' caluse -> no difference in speed.



Simplifying the 'where' clause by removing 'lng', ie



where lat between grid_table.lat - 0.0037 and grid_table.lat + 0.0037
-> faster: 3 secs rather than 30 secs.


Using spatial column and index (see below) -> much slower (190 sec). Not sure if I implemented this correctly though.



ALTER TABLE `grid_table` ADD COLUMN `coords` POINT NOT NULL;
update grid_table set coords = POINT(lat, lng);
ALTER TABLE `grid_table` ADD SPATIAL INDEX `coords` (`coords`);

ALTER TABLE `postcode_table` ADD COLUMN `coords` POINT NOT NULL;
update postcode_table set coords = POINT(lat, lng);
ALTER TABLE `postcode_table` ADD SPATIAL INDEX `coords` (`coords`);

analyze table grid_table;
optimize table grid_table;
analyze table postcode_table;
optimize table postcode_table;


update grid_table set nearestPostcode = (
select postcode from postcode_table
WHERE MBRContains(GeomFromText(concat(
'POLYGON((',
grid_table.lat - 0.0037, ' ', grid_table.lng - 0.0068, ', ',
grid_table.lat - 0.0037, ' ', grid_table.lng + 0.0068, ', ',
grid_table.lat + 0.0037, ' ', grid_table.lng - 0.0068, ', ',
grid_table.lat - 0.0037, ' ', grid_table.lng - 0.0068,
'))')), postcode_table.coords)
order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
limit 1
)

Source: Tips4allCCNA FINAL EXAM

Comments

  1. In your MySQL version your subquery works with all 30000 grid_table records, whether in your PHP version -- it's only one. As you add where on outer table PK.

    I suggest you here to change update query. For example, try to make it without subquery, multiple-update as here http://dev.mysql.com/doc/refman/5.0/en/update.html.

    I believe it should help.

    Something like:

    update grid_table, postcode_table
    set grid_table.nearestPostcode = postcode_table.postcode
    where postcode_table.lat > grid_table.lat - 0.0037
    and postcode_table.lat < grid_table.lat + 0.0037
    and postcode_table.lng > grid_table.lng - 0.0068
    and lng < grid_table.lng + 0.0068
    group by grid_table.idNo
    having (POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)) = min(POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2))


    May be this version could help, but I`m not sure. I assume, the main root problem in your 1st version is subquery over all records.

    To have explain update, you can "convert" it to similar select:

    explain
    select
    *,
    (
    select postcode from postcode_table
    where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
    and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
    order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
    limit 1
    ) nearestPostcode
    from grid_table


    And you will see:

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY grid_table ALL 224
    2 DEPENDENT SUBQUERY postcode_table ALL lat,lng,lat_lng 35605 Using where; Using temporary; Using filesort


    But in case of idNo we have:

    explain
    select
    *,
    (
    select postcode from postcode_table
    where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037
    and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068
    order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)
    limit 1
    ) nearestPostcode
    from grid_table
    where idNo = 1487;

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY grid_table const PRIMARY PRIMARY 4 const 1
    2 DEPENDENT SUBQUERY postcode_table range lat,lng,lat_lng lat 4 18 Using where; Using filesort


    So we have 35605 rows vs ~18*224 (~4000).

    To find correct query try to find good select 1st.

    Update

    Subquery isn't a root here :( So I think we should try some precalculated + indexed column may be. Target is to avoid order by SOMEFUNC()

    ReplyDelete
  2. Look at the execution plan to know what is taking so long. http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html

    ReplyDelete
  3. My guess is that the difference is due to your providing the value of $lat with in the row-by-row query, thereby saving large scans for the lookup here:-

    order by POW(lat - grid_table.lat,2)


    Like Mr47 says, you'll be able to see by EXPLAIN-ing the SQL statements.

    ReplyDelete

Post a Comment

Popular posts from this blog

Slow Android emulator

I have a 2.67 GHz Celeron processor, 1.21 GB of RAM on a x86 Windows XP Professional machine. My understanding is that the Android emulator should start fairly quickly on such a machine, but for me it does not. I have followed all instructions in setting up the IDE, SDKs, JDKs and such and have had some success in staring the emulator quickly but is very particulary. How can I, if possible, fix this problem?

CCNA 3 Final Exam => latest version

1 . Which security protocol or measure would provide the greatest protection for a wireless LAN? WPA2 cloaking SSIDs shared WEP key MAC address filtering   2 . Refer to the exhibit. All trunk links are operational and all VLANs are allowed on all trunk links. An ARP request is sent by computer 5. Which device or devices will receive this message? only computer 4 computer 3 and RTR-A computer 4 and RTR-A computer 1, computer 2, computer 4, and RTR-A computer 1, computer 2, computer 3, computer 4, and RTR-A all of the computers and the router   3 . Refer to the exhibit. Hosts A and B, connected to hub HB1, attempt to transmit a frame at the same time but a collision occurs. Which hosts will receive the collision jamming signal? only hosts A and B only hosts A, B, and C only hosts A, B, C, and D only hosts A, B, C, and E   4 . Refer to the exhibit. Router RA receives a packet with a source address of 192.168.1.65 and a destination address of 192.168.1.161...