Recently I needed to open the big file (apache log - 14 GB or so) and cut some information from it. Of course use of
file.read() and/or
file.readlines() method wasn't possible. On the other hand, using
file.readline() few (rather more than 20) million times doesn't sound right. Therefore, I looked for another resolution and found that you can limit the size of
readlines().
f=open('filename','r')
opensize=2**27
longlist=[]
while 1:
shortlist=[[l.split()[n] for n in [0,4,-2,-1]] for l in f.readlines(opensize)]
if not list:
break
else:
longlist.extend(shortlist)
The script open the 'filename' file and next in the loop:
- read from that file lines of size close to 128 Mb (2**27),
- cut first, fifth, next to last and last column from each line,
- add created (temporary) list to the output list.
It's worth to note that if
shortlist
is not created the script will leave the loop (lines 6 and 7).
It not obligatory, but I like to work with 2 powers, therefore opensize=2**27.
No comments:
Post a Comment