i am trying to read arabic text using Java , yet the scanner does not see any elements and thus reading is unsuccessful although LineNumberReader recognizes lines in the text file.
i have tried the same code on English text and it works fine.
i am using netbeans 7.0.1
here is my code :
public class ReadFile {
private int number_of_words;
private File f1;
private String array[][],lines[];
private Scanner scan1;
public ReadFile(String sf1) throws FileNotFoundException
{
f1=new File(sf1);
scan1=new Scanner(f1);
}
public String[][] getA()
{
return array;
}
public void read() throws IOException
{
int counter=0,i=0;
LineNumberReader lnr = new LineNumberReader(new FileReader(f1));
lnr.skip(Long.MAX_VALUE);
number_of_words=lnr.getLineNumber();
array = new String[2][number_of_words];
lines = new String[number_of_words];
while(scan1.hasNext())
{
String temp;
temp=scan1.nextLine();
lines[counter++] = temp;
System.out.println(lines[counter-1]+"\t"+lines.length);
}
Arrays.sort(lines);
counter=0;
while(i<lines.length)
{
String temp = lines[i++];
StringTokenizer tk=new StringTokenizer(temp,"\t");
array[0][counter] = tk.nextToken();
array[1][counter++] = tk.nextToken();
}
}
}
By default scanner uses system encoding. You need to use correct character encoding while reading data special characters.
ReplyDeletescan1=new Scanner(f1, "UTF-8");
If UTF-8 didn't work you need to try with arabic specific encoding.
Here are couple of links may be useful File reading practices and Java supported encodings
Try reading the file with this:
ReplyDeleteFileInputStream fis = new FileInputStream(f1);
LineNumberReader lnr = new LineNumberReader(new InputStreamReader(fis, "UTF-8"));
You need to use the right Charset when reading the file.
Scanner(System.in, "UTF-8")
ReplyDeleteis most probably what you are looking for.
Cheers, Eugene.