Skip to main content

Are there alternative data structures than array in PHP, where I can benefit from different index techniques?


Lately I had an issue with an array that contained some hundred thousands of values and the only thing I wanted to do was to check whether a value was already present. In my case this were IPs from a webserver log. So basically something like:



in_array(ip2long(ip),$myarray) did the job



However the lookup time increased dramatically and 10k of lookups took around 17 seconds or so.



So in this case I didn't care whether I had duplicates or not, I just needed to check for existence. So I could store the IPs in the index like this:




isset($myarray[ip2long($ip)])



And boom, lookup times went down from 17 seconds (and more) to a static time of 0.8 seconds for 10k lookups. As a value for the array entry I just used int 1 .



I think the array index is probably based on some b-tree which should have log(n) lookup time and the index on a hashmap.



In my case using the index worked fine, but are there any data structures where I can use hashmaps as a value index, where multiple values may also occour (i realize that this makes only sense if do not have too many duplicates and I cannot use range/search requests efficiently, which is the primary benefit of tree structures)?


Source: Tips4all

Comments

  1. There are a whole range of alternatives datastructures beyond simple arrays in the SPL library bundled with PHP, including linked lists, stacks, heaps, queues, etc.

    However, I suspect you could make your logic a whole lot more efficient if you flipped your array, allowing you to do a lookup on the key (using the array_key_exists() function) rather than search for the value. The array index is a hash, rather than a btree, making for very fast direct access via the key.

    However, if you're working with 10k entries in an array, you'd probably be better taking advantage of a database, where you can define your own indexes.

    ReplyDelete
  2. Arrays have an sequential order and it's quick to access certain elements, because you don't need to traverse a tree or work through a sequential list structure.

    A set is of course faster here, because you only check unique elements and not all elements (in the array).

    Tree's are fine for in example sorted structures. You could implement a tree with IPs sorted by their ranges, then you could decide faster if this IP exist or not.
    I'm not sure if PHP provides such customised tree structures. I guess you'll need to implement this yourself, but this will take about half an hour.

    You'll find sample codes on the web for such tree structures.

    ReplyDelete
  3. You also have the chdb (constant hash database) extension - which is perfect for this.

    ReplyDelete

Post a Comment

Popular posts from this blog

[韓日関係] 首相含む大幅な内閣改造の可能性…早ければ来月10日ごろ=韓国

div not scrolling properly with slimScroll plugin

I am using the slimScroll plugin for jQuery by Piotr Rochala Which is a great plugin for nice scrollbars on most browsers but I am stuck because I am using it for a chat box and whenever the user appends new text to the boxit does scroll using the .scrollTop() method however the plugin's scrollbar doesnt scroll with it and when the user wants to look though the chat history it will start scrolling from near the top. I have made a quick demo of my situation http://jsfiddle.net/DY9CT/2/ Does anyone know how to solve this problem?

Why does this javascript based printing cause Safari to refresh the page?

The page I am working on has a javascript function executed to print parts of the page. For some reason, printing in Safari, causes the window to somehow update. I say somehow, because it does not really refresh as in reload the page, but rather it starts the "rendering" of the page from start, i.e. scroll to top, flash animations start from 0, and so forth. The effect is reproduced by this fiddle: http://jsfiddle.net/fYmnB/ Clicking the print button and finishing or cancelling a print in Safari causes the screen to "go white" for a sec, which in my real website manifests itself as something "like" a reload. While running print button with, let's say, Firefox, just opens and closes the print dialogue without affecting the fiddle page in any way. Is there something with my way of calling the browsers print method that causes this, or how can it be explained - and preferably, avoided? P.S.: On my real site the same occurs with Chrome. In the ex