Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Hopefully, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.
Introducing openpyxl.reader.iter_worksheet.IterableWorksheet:
from openpyxl import load_workbook
wb = load_workbook(filename = 'large_file.xlsx', use_iterators = True)
ws = wb.get_sheet_by_name(name = 'big_data') # ws is now an IterableWorksheet
for row in ws.iter_rows(): # it brings a new method: iter_rows()
for cell in row:
print cell.internal_value
Warning
Cells returned by iter_rows() are not regular openpyxl.cell.Cell but openpyxl.reader.iter_worksheet.RawCell.
Here again, the regular openpyxl.worksheet.Worksheet has been replaced by a faster alternative, the openpyxl.writer.dump_worksheet.DumpWorksheet. When you want to dump large amounts of data, you might find optimized writer helpful:
from openpyxl import Workbook
wb = Workbook(optimized_write = True)
ws = wb.create_sheet()
# now we'll fill it with 10k rows x 200 columns
for irow in xrange(10000):
ws.append(['%d' % i for i in xrange(200)])
wb.save('new_big_file.xlsx') # don't forget to save !
Warning