php - 如何使用 PHPExcel 从大型 Excel 文件 (27MB+) 读取大型工作表?

标签 php phpexcel

我有大型 Excel 工作表,我希望能够使用 PHPExcel 将其读入 MySQL。

我正在使用 recent patch它允许您在不打开整个文件的情况下阅读工作表。这样我可以一次阅读一个工作表。

但是,一个 Excel 文件大小为 27MB。我可以成功读取第一个工作表,因为它很小,但第二个工作表太大,以至于在 22:00 开始进程的 cron 作业在上午 8:00 没有完成,工作表太简单了

有什么方法可以逐行阅读工作表,例如像这样:

$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);

foreach ($worksheetNames as $sheetName) {
    //BELOW IS "WISH CODE":
    foreach($row = 1; $row <=$max_rows; $row+= 100) {
        $dataset = $objReader->getWorksheetWithRows($row, $row+100);
        save_dataset_to_database($dataset);
    }
}

附录

@mark,我使用您发布的代码创建了以下示例:

function readRowsFromWorksheet() {

    $file_name = htmlentities($_POST['file_name']);
    $file_type = htmlentities($_POST['file_type']);

    echo 'Read rows from worksheet:<br />';
    debug_log('----------start');
    $objReader = PHPExcel_IOFactory::createReader($file_type);
    $chunkSize = 20;
    $chunkFilter = new ChunkReadFilter();
    $objReader->setReadFilter($chunkFilter);

    for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
        $chunkFilter->setRows($startRow, $chunkSize);
        $objPHPExcel = $objReader->load('data/' . $file_name);
        debug_log('reading chunk starting at row '.$startRow);
        $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
        var_dump($sheetData);
        echo '<hr />';
    }
    debug_log('end');
}

如以下日志文​​件所示,它在 8K 小型 Excel 文件上运行良好,但当我在 3 MB Excel 文件上运行时,它永远无法通过第一个 block ,有什么方法可以优化此代码的性能,否则它看起来性能不足以从大型 Excel 文件中获取 block :

2011-01-12 11:07:15: ----------start
2011-01-12 11:07:15: reading chunk starting at row 2
2011-01-12 11:07:15: reading chunk starting at row 22
2011-01-12 11:07:15: reading chunk starting at row 42
2011-01-12 11:07:15: reading chunk starting at row 62
2011-01-12 11:07:15: reading chunk starting at row 82
2011-01-12 11:07:15: reading chunk starting at row 102
2011-01-12 11:07:15: reading chunk starting at row 122
2011-01-12 11:07:15: reading chunk starting at row 142
2011-01-12 11:07:15: reading chunk starting at row 162
2011-01-12 11:07:15: reading chunk starting at row 182
2011-01-12 11:07:15: reading chunk starting at row 202
2011-01-12 11:07:15: reading chunk starting at row 222
2011-01-12 11:07:15: end
2011-01-12 11:07:52: ----------start
2011-01-12 11:08:01: reading chunk starting at row 2
(...at 11:18, CPU usage at 93% still running...)

附录 2

当我注释掉时:

//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);

然后它以可接受的速度(大约每秒2行)进行解析,无论如何可以提高toArray()的性能?

2011-01-12 11:40:51: ----------start
2011-01-12 11:40:59: reading chunk starting at row 2
2011-01-12 11:41:07: reading chunk starting at row 22
2011-01-12 11:41:14: reading chunk starting at row 42
2011-01-12 11:41:22: reading chunk starting at row 62
2011-01-12 11:41:29: reading chunk starting at row 82
2011-01-12 11:41:37: reading chunk starting at row 102
2011-01-12 11:41:45: reading chunk starting at row 122
2011-01-12 11:41:52: reading chunk starting at row 142
2011-01-12 11:42:00: reading chunk starting at row 162
2011-01-12 11:42:07: reading chunk starting at row 182
2011-01-12 11:42:15: reading chunk starting at row 202
2011-01-12 11:42:22: reading chunk starting at row 222
2011-01-12 11:42:22: end

附录 3

这似乎可以正常工作,例如,至少在 3 MB 文件上:

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
    $chunkFilter->setRows($startRow, $chunkSize);
    $objPHPExcel = $objReader->load('data/' . $file_name);
    debug_log('reading chunk starting at row ' . $startRow);
    foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false);
        echo '<tr>';
        foreach ($cellIterator as $cell) {
            if (!is_null($cell)) {
                //$value = $cell->getCalculatedValue();
                $rawValue = $cell->getValue();
                debug_log($rawValue);
            }
        }
    }
}

最佳答案

可以使用读取过滤器读取“ block ”中的工作表,尽管我不能保证效率。

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';


/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}


echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/**  Create a new Reader of the type defined in $inputFileType  **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);



echo '<hr />';


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 20;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);

/**  Loop to read our worksheet in "chunk size" blocks  **/
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
    $chunkFilter->setRows($startRow,$chunkSize);
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
    $objPHPExcel = $objReader->load($inputFileName);

    //    Do some processing here

    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
    var_dump($sheetData);
    echo '<br /><br />';
}

请注意,此读取过滤器将始终读取工作表的第一行,以及 block 规则定义的行。

当使用读取过滤器时,PHPExcel 仍然解析整个文件,但只加载那些与定义的读取过滤器匹配的单元格,因此它只使用该单元格数量所需的内存。但是,它会多次解析文件,每个 block 解析一次,所以会比较慢。此示例一次读取 20 行:要逐行读取,只需将 $chunkSize 设置为 1。

如果您的公式引用了不同“ block ”中的单元格,这也会导致问题,因为数据根本不适用于当前“ block ”之外的单元格。

关于php - 如何使用 PHPExcel 从大型 Excel 文件 (27MB+) 读取大型工作表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4666746/

相关文章:

PHP is_writable() 函数对于可写目录总是返回 false

PHP DomDocument - getElementByID(部分匹配)如何?

php - 使用带标题的 php 下载在 IE8 上不起作用

PHPExcel:将 Excel 插入 SQL 数据库时单元格坐标 A 无效

php - 如何使用单个查询从多个mysql表中获取数据

php - 如何从字符串中替换或删除 ndash 字符?

php - 如何在 facebook Graph API 中使用 Javascript 取消评论

php - Bootstrap 问题, "fileupload"在模态内部不工作

php - 拆分 MySQL 结果数组以输出总和和类别标题

PHPExcel下载文件