Reply to topic  [ 5 posts ] 
Quick question 
Author Message
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 7:35 pm
Posts: 6580
Location: Getting there
Reply with quote
Not a code question, just a concept question...

I give you two arrays and they both contain strings. I want you to return to me an array that contains strings common to both arrays.

How would you do it?

My next post will be the best answer I could come up with in about 3 minutes on the phone to Apple.

_________________
Oliver Foggin - iPhone Dev

JJW009 wrote:
The count will go up until they stop counting. That's the way counting works.


Doodle Sub!
Game Of Life

Image Image


Wed Oct 03, 2012 5:33 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 7:35 pm
Posts: 6580
Location: Getting there
Reply with quote
Take both array (array1 and array2) and sort their contents alphabetically.

Now store two indices i (for array1) and j (for array2).

Start at i = 0 and iterate through j from j = 0.

When you find a value j where array2[j] is the same string as array1[i] then add that into a third array (resultArray) and increment i by 1 and continue iterating through j.

No repeat this until you reach the end of array2.

When you reach the end of array2 then resultArray will be an array of all strings that occurred in both array1 and array2.

_________________
Oliver Foggin - iPhone Dev

JJW009 wrote:
The count will go up until they stop counting. That's the way counting works.


Doodle Sub!
Game Of Life

Image Image


Wed Oct 03, 2012 5:37 pm
Profile WWW
What's a life?
User avatar

Joined: Thu Apr 23, 2009 7:26 pm
Posts: 17040
Reply with quote
It would probably be quicker while you're sorting alphabetically to generate an index of both tables by first letter, so you only need to iterate your search through strings where at least the first letters match. Course you could do that all the way through but I'm assuming the strings are of arbitrary length.

Or do some kind of fast checksum of each of the strings and compare those - but you'd have to test whether checksum generation & comparison was quicker than just plain string comparisons.


Wed Oct 03, 2012 6:27 pm
Profile
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 9:40 pm
Posts: 5288
Location: ln -s /London ~
Reply with quote
There's all sorts of questions I'd want to know first that'd influence my approach:

  • What are the relative sizes of the arrays - is one massive relative to t'other?
  • How long is each string
  • What sort of number of strings are we talking about?
  • Does either array contain duplicates, and if so how many duplicates?
  • How similar are the strings in each set? Is there a high chance of hash collisions for any reason?

_________________
timark_uk wrote:
Gay sex is better than no sex

timark_uk wrote:
Edward Armitage is Awesome. Yes, that's right. Awesome with a A.


Wed Oct 03, 2012 6:40 pm
Profile
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 6:36 pm
Posts: 5156
Location: /dev/tty0
Reply with quote
What language is it in?

If I could do it in Perl I think I'd do something like:
1. Convert array 1 to a hash.
2. Iterate through array 2 and see if it exists in the hash.

Something like:
Code:
#!/usr/bin/env perl

use 5.010;
use warnings;
use strict;
use File::Slurp;

# Set up variables
my @array1;
my @array2;
my @returnArray;

# Import word lists
my $WORDFILE='/Users/benlavery/scripts/Dict';
@array1 = read_file($WORDFILE);
chomp(@array1);

$WORDFILE='/Users/benlavery/scripts/DictRev';
@array2 = read_file($WORDFILE);
chomp(@array2);

# Convert array1 to a hash
my %hash = map { $_ => 1 } @array1;

# For each item in array2, see if it exists in the hash
# If it does exist, add it to the returnArray array.
foreach my $item(@array2){
    if(exists $hash{$item}){
        push(@returnArray, $item);
    }
}

# Print the returnArray array with each element on a new line
print join("\n",@returnArray),"\n";


The code above runs on my iMac in the following time:
Code:
real   0m0.914s
user   0m0.553s
sys   0m0.063s


Dict and DictRev are files that contain 173122 words. They both contain the same words but DictRev is reversed. Sorting DictRev by general numeric value (sort -g) gets a similar process time.

If I had to do it in Java, I would probably put array1 into a hash table creating hashes of each word, then iterate through array2 creating a hash and seeing if it existed i the hash table.
Not sure on the inbuilt data structures of Cocoa, so not sure on my approach using Objective-C.


Wed Oct 03, 2012 11:53 pm
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 5 posts ] 

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.