我是一名正在暑期实习的学生。我的任务是处理从 Excel 到 SQL Server 数据库的数据输入,以进行多年的调查。任务概述如下:
There are three tables, a main event, an individual event and an individual. An event has many individual events, an individual event has many individuals. My code regards just the last two tables.
I read two files, a list of all individual events in one file, and a list of all individuals in the other. The individual's data tells me what individual event it is associated with.
我的代码基本上读取一个单独的事件,然后在第二个文件中查找任何相关的个人。对于个人文件中的每一行,如果关联,则将其插入到正确的表中,否则将其写入新文件。遍历整个文件后,新文件将复制到旧文件,从而删除已输入数据库的数据。
只需一次又一次地重新读取完整的个人文件,这种复制就减少了 3 分钟的执行时间。但有更好的方法吗?我的示例数据的执行时间约为 47 秒...理想情况下我希望该时间更低。
任何建议,无论多么微不足道,我们都会感激不已。
编辑: 这是我正在使用的代码的精简版本
<?php
//not shown:
//connect to database
//input event data
//get the id of the event
//open files
$s_handle = fopen($_FILES['surveyfile']['tmp_name'],'r');//open survey file
copy($_FILES['cocklefile']['tmp_name'],'file1.csv');//make copy of the cockle file
//read files
$s_csv = fgetcsv($s_handle,'0',',');
//read lines and print lines
// then input data via sql
while (! feof($s_handle))
{
$max_index = count($s_csv);
$s_csv[$max_index]='';
foreach($s_csv as $val)
{
if(!isset($val))
$val = '';
}
$grid_no = $s_csv[0];
$sub_loc = $s_csv[1];
/*
.define more variables
.*/
$sql = "INSERT INTO indipendant_event"
."(parent_id,grid_number,sub_location,....)"
."VALUES ("
."'{$event_id}',"
."'{$grid_no}',"
//...
.");";
if (!odbc_exec($con,$sql))
{
echo "WARNING: SQL INSERT INTO fssbur.cockle_quadrat FAILED. PHP.";
}
//get ID
$sql = "SELECT MAX(ind_event_id)"
."FROM independant_event";
$return = odbc_exec($con,$sql);
$ind_event_id = odbc_result($return, 1);
//insert individuals
$c_2 = fopen('file2.csv','w');//create file c_2 to write to
$c_1 = fopen('file1.csv','r');//open the data to read
$c_csv = fgetcsv($c_1,'0',',');//get the first line of data
while(! feof($c_1))
{
for($i=0;$i<9;$i++)//make sure theres a value in each column
{
if(!isset($c_csv[$i]))
$c_csv[$i] = '';
}
//give values meaningful names
$stat_no = $c_csv[0];
$sample_method = $c_csv[1];
//....
//check whether the current line corresponds to the current station
if (strcmp(strtolower($stat_no),strtolower($grid_no))==0)
{
$sql = "INSERT INTO fssbur2.cockle"
."(parent_id,sampling_method,shell_height,shell_width,age,weight,alive,discarded,damage)"
."VALUES("
."'{$ind_event_id}',"
."'{$sample_method}',"
//...
."'{$damage}');";
//write data if it corresponds
if (!odbc_exec($con,$sql))
{
echo "WARNING: SQL INSERT INTO fssbur.cockle FAILED. PHP.";
}
$c_csv = fgetcsv($c_1,'0',',');
}
else//no correspondance
{
fputcsv($c_2,$c_csv);//write line to the new file
$c_csv = fgetcsv($c_1,'0',',');//get new line
continue;//rinse and repeat
}
}//end while, now gone through all individuals, and filled c_2 with the unused data
fclose($c_1);//close files
fclose($c_2);
copy('file2.csv','file1.csv');//copy new file to old, removing used data
$s_csv = fgetcsv($s_handle,'0',',');
}//end while
//close file
fclose($s_handle);
?>
最佳答案
我可能没有完全理解这个过程,但为什么不将整个 CSV 插入到数据库表中。这可能看起来像是浪费精力,但它可能会得到返回。完成初始导入后,查找与事件关联的任何个人应该会更快,因为 DBMS 将能够使用索引来加速这些查找(与基于文件的线性遍历相比)。准确地说:您的“个人”表可能会有一个外键进入您的“individual_event”表。只要你在这个外键上创建索引,查找就会明显更快(有可能简单地声明这个字段为外键就会导致 SQL Server 自动索引它,但我不能肯定地说,我不这样做)并没有真正使用 MSSQL)。
顺便说一句,我们谈论的是多少条记录?如果我们要处理数千条记录,那么期望此类事情在几秒钟内运行绝对是合理的。
关于php - 我是否采取了正确的方法来处理这些文件? (带有 PHP 的 CSV),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6569563/