c# - 跨语言更正正则表达式模式

标签 c# .net regex

我在 http://gskinner.com/RegExr/ 找到了这个正则表达式模式

,(?=(?:[^"]*"[^"]*")*(?![^"]*"))

这用于模式匹配 CSV 分隔值(更具体地说,是分隔逗号,可以拆分),在该网站上它与我的测试数据配合得很好。您可以在测试时链接的网站的底部面板中看到我认为的 JavaScript 实现。

但是,当我尝试在 C#/.net 中实现此功能时,匹配无法正常工作。 我的实现:

Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))", RegexOptions.ECMAScript);
//get data...
foreach (string match in r.Split(sr.ReadLine()))
{
    //lblDev.Text = lblDev.Text + match + "<br><br><br><p>column:</p><br>";
    dtF.Columns.Add(match);
}

//more of the same to get rows

在某些数据行上,结果与上面站点上生成的结果完全匹配,但在其他数据行上,前 6 行左右无法拆分或根本不存在于拆分数组中。

任何人都可以告诉我为什么该模式的行为似乎不一样吗?

我的测试数据:

CategoryName,SubCategoryName,SupplierName,SupplierCode,ProductTitle,Product Company ,ProductCode,Product_Index,ProductDescription,Product BestSeller,ProductDimensions,ProductExpressDays,ProductBrandName,ProductAdditionalText ,ProductPrintArea,ProductPictureRef,ProductThumnailRef,ProductQuantityBreak1 (QB1),ProductQuantityBreak2 (QB2),ProductQuantityBreak3 (QB3),ProductQuantityBreak4 (QB4),ProductPlainPrice1,ProductPlainPrice2,ProductPlainPrice3,ProductPlainPrice4,ProductColourPrice1,ProductColourPrice2,ProductColourPrice3,ProductColourPrice4,ProductExtraColour1,ProductExtraColour2,ProductExtraColour3,ProductExtraColour4,SellingPrice1,SellingPrice2,SellingPrice3,SellingPrice4,ProductCarriageCost1,ProductCarriageCost2,ProductCarriageCost3,ProductCarriageCost4,BLACK,BLUE,WHITE,SILVER,GOLD,RED,YELLOW,GREEN,ProductOtherColors,ProductOrigination,ProductOrganizationCost,ProductCatalogEntry,ProductPageNumber,ProductPersonalisationType1 (PM1),ProductPrintPosition,ProductCartonQuantity,ProductCartonWeight,ProductPricingExpering,NewProduct,ProductSpecialOffer,ProductSpecialOfferEnd,ProductIsActive,ProductRepeatOrigination,ProductCartonDimession,ProductSpecialOffer1,ProductIsExpress,ProductIsEco,ProductIsBiodegradable,ProductIsRecycled,ProductIsSustainable,ProductIsNatural
Audio,Speakers and Headphones,The Prime Time Company,CM5064:In-ear headphones,Silly Buds,,10058,372,"Small, trendy ear buds with excellent sound quality and printing area actually on each ear- piece. Plastic storage box, with room for cables be wrapped around can also be printed.",FALSE,70 x 70 x 20mm,,,,10mm dia,10058.jpg,10058.jpg,100,250,500,1000,2.19,2.13,2.06,1.99,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,3.81,3.71,3.42,3.17,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,Earpiece,200,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
Audio,Speakers and Headphones,The Prime Time Company,CM5058:Headstart,Head Start,,10060,372,"Lightweight, slimline, foldable and patented headphones ideal for the gym or exercise. These
headphones uniquely hang from the ears giving security, comfort and an excellent sound quality. There is also a secret cable winding facility.",FALSE,130 x 85 x 45mm,,,,30mm dia,10060.jpg,10060.jpg,100,250,500,1000,5.6,5.43,5.26,5.09,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,9.47,8.96,8.24,7.97,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,print plate on ear (s),100,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE

最佳答案

使用正确的工具来完成工作。正则表达式不太适合解析可以有无限数量嵌套引号的 CSV。

改用这个:

快速 CSV 阅读器

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

我们在生产代码中使用它。它工作得很好,让你体会到解析是多么复杂。有关复杂性的更多信息,请查看解决方案中包含的 800 多个单元测试。

关于c# - 跨语言更正正则表达式模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16436722/

相关文章:

Python 模式匹配。匹配 'c[any number of consecutive a' s, b's, or c's or b's, c's, or a's etc.]t'

c# - 将字符串转换为 MongoDB BsonDocument

c# - C#'s equivalent to Python' s os.path.exists()?

c# - 这是冗余代码吗?

C# MySql 0x80004005) : an error in SQL syntax;

c# - 使用 ADO.Net 的查询的连接超时异常

java - 仅检查字符串中整数的工作解决方案?

c# - 为什么这个值不是有效的 DateTime 对象?

c# - .Net 和 C# 中的反射导向和切面导向

java - Java中如何用\n替换\\n