Asked  7 Months ago    Answers:  5   Viewed   50 times

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file

I was wondering if there is a smarter way to do that

 Answers

24

This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.

public static int countLinesOld(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == 'n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];

        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }

        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i=0; i<1024;) {
                if (c[i++] == 'n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        // count remaining characters
        while (readChars != -1) {
            System.out.println(readChars);
            for (int i=0; i<readChars; ++i) {
                if (c[i] == 'n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.

Benchmark Plot

Tuesday, June 1, 2021
 
Semirix
answered 7 Months ago
86
Number.prototype.countDecimals = function () {
    if(Math.floor(this.valueOf()) === this.valueOf()) return 0;
    return this.toString().split(".")[1].length || 0; 
}

When bound to the prototype, this allows you to get the decimal count (countDecimals();) directly from a number variable.

E.G.

var x = 23.453453453;
x.countDecimals(); // 9

It works by converting the number to a string, splitting at the . and returning the last part of the array, or 0 if the last part of the array is undefined (which will occur if there was no decimal point).

If you do not want to bind this to the prototype, you can just use this:

var countDecimals = function (value) {
    if(Math.floor(value) === value) return 0;
    return value.toString().split(".")[1].length || 0; 
}

EDIT by Black:

I have fixed the method, to also make it work with smaller numbers like 0.000000001

Number.prototype.countDecimals = function () {

    if (Math.floor(this.valueOf()) === this.valueOf()) return 0;

    var str = this.toString();
    if (str.indexOf(".") !== -1 && str.indexOf("-") !== -1) {
        return str.split("-")[1] || 0;
    } else if (str.indexOf(".") !== -1) {
        return str.split(".")[1].length || 0;
    }
    return str.split("-")[1] || 0;
}


var x = 23.453453453;
console.log(x.countDecimals()); // 9

var x = 0.0000000001;
console.log(x.countDecimals()); // 10

var x = 0.000000000000270;
console.log(x.countDecimals()); // 13

var x = 101;  // Integer number
console.log(x.countDecimals()); // 0
Wednesday, July 14, 2021
 
StampyCode
answered 5 Months ago
29

The trick is to use connection AND open it before read.table:

con<-file('filename')
open(con)

read.table(con,skip=5,nrow=1) #6-th line
read.table(con,skip=20,nrow=1) #27-th line
...
close(con)

You may also try scan, it is faster and gives more control.

Sunday, August 1, 2021
 
Giovanni
answered 4 Months ago
34

Simply based on the example from your previous question...

Document doc = textArea.getDocument();
Element root = doc.getDefaultRootElement();
Element element = root.getElement(2);
int start = element.getStartOffset();
int end = element.getEndOffset();
System.out.println(doc.getText(start, end - start));

And the runnable code

import java.awt.BorderLayout;
import java.awt.EventQueue;
import java.awt.event.ActionEvent;
import javax.swing.AbstractAction;
import javax.swing.JButton;
import javax.swing.JComponent;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
import javax.swing.WindowConstants;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
import javax.swing.text.Element;

public class ElementEndOffsetTest {

    public JComponent makeUI() {
        String str = "name : andyn"
                + "birth : jakarta, 1 jan 1990n"
                + "number id : 01011990 01n"
                + "age : 26n"
                + "study : Informatics engineeringn";

        JTextArea textArea = new JTextArea(str);
        textArea.setEditable(false);
        JPanel p = new JPanel(new BorderLayout());
        p.add(new JScrollPane(textArea));
        p.add(new JButton(new AbstractAction("add") {
            @Override
            public void actionPerformed(ActionEvent e) {
                try {
                    Document doc = textArea.getDocument();
                    Element root = doc.getDefaultRootElement();
                    Element element = root.getElement(2);
                    int start = element.getStartOffset();
                    int end = element.getEndOffset();
                    System.out.println(doc.getText(start, end - start));
                } catch (BadLocationException ex) {
                    ex.printStackTrace();
                }
            }
        }), BorderLayout.SOUTH);
        return p;
    }

    public static void main(String[] args) {
        EventQueue.invokeLater(() -> {
            JFrame f = new JFrame();
            f.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
            f.getContentPane().add(new ElementEndOffsetTest().makeUI());
            f.setSize(320, 240);
            f.setLocationRelativeTo(null);
            f.setVisible(true);
        });
    }
}

Based on your questions, I would, however, recommend you consider using a JTable instead, it would be easier, see How to Use Tables for more details

Friday, August 27, 2021
 
Gabriele Mariotti
answered 4 Months ago
27

If you just want to add the data to an array, then I append the new values to an array. If the amount of data you are reading isn't large and you don't need to do it often that should be fine. I use something like this, as given in this answer: Reading a plain text file in Java

BufferedReader fileReader = new BufferedReader(new FileReader("path/to/file.txt"));
try {
    StringBuilder sb = new StringBuilder();
    String line = br.readLine();

    while (line != null) {
        sb.append(line);
        sb.append(System.lineSeparator());
        line = br.readLine();
    }
    String everything = sb.toString();
} finally {
    br.close();
}

If you are reading in numbers, the strings can be converted to numbers, say for integers intValue = Integer.parseInt(text)

Friday, October 22, 2021
 
user35358
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share