我在 http://gskinner.com/RegExr/ 找到了这个正则表达式模式
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))
这用于模式匹配 CSV 分隔值(更具体地说,是分隔逗号,可以拆分),在该网站上它与我的测试数据配合得很好。您可以在测试时链接的网站的底部面板中看到我认为的 JavaScript 实现。
但是,当我尝试在 C#/.net 中实现此功能时,匹配无法正常工作。 我的实现:
Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))", RegexOptions.ECMAScript);
//get data...
foreach (string match in r.Split(sr.ReadLine()))
{
//lblDev.Text = lblDev.Text + match + "<br><br><br><p>column:</p><br>";
dtF.Columns.Add(match);
}
//more of the same to get rows
在某些数据行上,结果与上面站点上生成的结果完全匹配,但在其他数据行上,前 6 行左右无法拆分或根本不存在于拆分数组中。
任何人都可以告诉我为什么该模式的行为似乎不一样吗?
我的测试数据:
CategoryName,SubCategoryName,SupplierName,SupplierCode,ProductTitle,Product Company ,ProductCode,Product_Index,ProductDescription,Product BestSeller,ProductDimensions,ProductExpressDays,ProductBrandName,ProductAdditionalText ,ProductPrintArea,ProductPictureRef,ProductThumnailRef,ProductQuantityBreak1 (QB1),ProductQuantityBreak2 (QB2),ProductQuantityBreak3 (QB3),ProductQuantityBreak4 (QB4),ProductPlainPrice1,ProductPlainPrice2,ProductPlainPrice3,ProductPlainPrice4,ProductColourPrice1,ProductColourPrice2,ProductColourPrice3,ProductColourPrice4,ProductExtraColour1,ProductExtraColour2,ProductExtraColour3,ProductExtraColour4,SellingPrice1,SellingPrice2,SellingPrice3,SellingPrice4,ProductCarriageCost1,ProductCarriageCost2,ProductCarriageCost3,ProductCarriageCost4,BLACK,BLUE,WHITE,SILVER,GOLD,RED,YELLOW,GREEN,ProductOtherColors,ProductOrigination,ProductOrganizationCost,ProductCatalogEntry,ProductPageNumber,ProductPersonalisationType1 (PM1),ProductPrintPosition,ProductCartonQuantity,ProductCartonWeight,ProductPricingExpering,NewProduct,ProductSpecialOffer,ProductSpecialOfferEnd,ProductIsActive,ProductRepeatOrigination,ProductCartonDimession,ProductSpecialOffer1,ProductIsExpress,ProductIsEco,ProductIsBiodegradable,ProductIsRecycled,ProductIsSustainable,ProductIsNatural
Audio,Speakers and Headphones,The Prime Time Company,CM5064:In-ear headphones,Silly Buds,,10058,372,"Small, trendy ear buds with excellent sound quality and printing area actually on each ear- piece. Plastic storage box, with room for cables be wrapped around can also be printed.",FALSE,70 x 70 x 20mm,,,,10mm dia,10058.jpg,10058.jpg,100,250,500,1000,2.19,2.13,2.06,1.99,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,3.81,3.71,3.42,3.17,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,Earpiece,200,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
Audio,Speakers and Headphones,The Prime Time Company,CM5058:Headstart,Head Start,,10060,372,"Lightweight, slimline, foldable and patented headphones ideal for the gym or exercise. These
headphones uniquely hang from the ears giving security, comfort and an excellent sound quality. There is also a secret cable winding facility.",FALSE,130 x 85 x 45mm,,,,30mm dia,10060.jpg,10060.jpg,100,250,500,1000,5.6,5.43,5.26,5.09,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,9.47,8.96,8.24,7.97,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,print plate on ear (s),100,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
最佳答案
使用正确的工具来完成工作。正则表达式不太适合解析可以有无限数量嵌套引号的 CSV。
改用这个:
快速 CSV 阅读器
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
我们在生产代码中使用它。它工作得很好,让你体会到解析是多么复杂。有关复杂性的更多信息,请查看解决方案中包含的 800 多个单元测试。
关于c# - 跨语言更正正则表达式模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16436722/