我有一个服务器端日志文件夹,其中包含数百个日志,其中大部分位于根据日志来自的机器的子目录中。任务是提取每个目录中包含特定字符串(并非所有文件都有此字符串)的最新文件的名称,以便可以按机器进行分析。我在下面包含了我的尝试,但它看起来相当笨拙和冗长,我想知道是否有一种更简单/更好/更快/更有效的方法可以使用 linq 来做到这一点?
void Main()
{
string SourcePath = @"L:\machinelogs";
string filemask = "*.log";
string searchitem = @"cannot access server data";
List<string> fileswithsearchitem = new List<string>();
DirectoryInfo directory = new DirectoryInfo(SourcePath);
IEnumerable<DirectoryInfo> dirs = directory.EnumerateDirectories("*",new EnumerationOptions() { RecurseSubdirectories = true, IgnoreInaccessible = true });
dirs.Append(directory);
foreach (var dir in dirs)
{
var found = false;
var files = dir.EnumerateFiles(filemask);
foreach(var file in files.OrderByDescending(f => f.CreationTime).ToList())
{
foreach (var line in File.ReadLines(file.FullName))
{
if(line.Contains(searchitem))
{
fileswithsearchitem.Add(file.FullName + " : " + line);
found = true;
break;
}
}
if(found)
{
break;
}
}
}
foreach (string item in fileswithsearchitem)
{
Console.WriteLine(item);
}
}
最佳答案
I wonder if there is an easier/better/faster/more efficient way of doing this
我支持将您的问题发布到 https://codereview.stackexchange.com/ 上的建议.我不是刻薄或敌对。要求“更简单/更好/更快/更有效的方法” 要求代码审查。你会在那里得到更好的答案。有了这个……
...maybe with linq?
我从未见过 Linq 让任何事情执行得更快。事实上,我唯一注意到的性能差异是更糟的。另一方面,它有助于使代码更具表现力。所以我将 Linq 视为一种权衡。在这种情况下,是的,这对您来说可能是值得的。
The task is to extract the name of the latest file in each directory containing a particular string (not all files have this string) so that analysis can be done per machine.
I have included my attempt below but it seems rather clunky and long-winded
您编写的代码不可重用;相反,它需要:
- 一个名为“L”的盘符(不常见)
- 用
\
分隔的盘符和路径(这只发生在 Windows 上) - 日志文件名的扩展名为“.log”
- 搜索文本为“无法访问服务器数据”
- 出现在 STDOUT 中的文本
- 一个真实的文件系统
但是这些对您来说真的是问题吗?进行抽象是否值得花费时间和精力?如果你有什么工作那么为什么要修复它?我无法为您回答这些问题,但您必须进行一些 self 反省。
抽象
这里有一些可能的抽象,以及一些可能的使用方法。
抽象文件系统
拥有一个抽象的文件系统可以更轻松地编写自动化测试,这样您就可以确保您的代码在未来几年的变化中继续工作。
这些是我看到你使用的方法:
- 枚举目录
- 枚举文件
通过足够的小手操作,您的代码可能如下所示:
interface IDirectory
{
/// <summary>
/// Recursively yields all accessible nested directories
/// </summary>
IEnumerable<IDirectory> EnumerateDirectories();
/// <summary>
/// Yields all file paths that match the given mask. Yields them in order of
/// newest first.
/// </summary>
IEnumerable<string> EnumerateFiles(string mask);
}
interface IFileSystem
{
IDirectory OpenDirectory(string path);
Stream OpenFile(string path);
}
class DirectoryInfoAdapter : IDirectory
{
readonly DirectoryInfo _info;
public IEnumerable<IDirectory> EnumerateDirectories() => _info
.EnumerateDirectories("*", new EnumerationOptions() { RecurseSubdirectories = true, IgnoreInaccessible = true })
.Select(x => new DirectoryInfoAdapter(x));
public IEnumerable<string> EnumerateFiles(string mask) => _info
.EnumerateFiles(mask)
.Select(x => x.FullName);
}
class RealFileSystem : IFileSystem
{
public IDirectory OpenDirectory(string path) => new DirectoryInfoAdapter(new DirectoryInfo(path));
public Stream OpenFile(string path) => File.Open(path);
}
void DoStuff(IFileSystem fileSystem)
{
string SourcePath = @"L:\machinelogs";
string filemask = "*.log";
string searchitem = @"cannot access server data";
List<string> fileswithsearchitem = new List<string>();
IDirectory directory = fileSystem.OpenDirectory(SourcePath);
IEnumerable<IDirectory> dirs = directory.EnumerateDirectories();
dirs.Append(directory);
foreach (var dir in dirs)
{
var found = false;
var files = dir.EnumerateFiles(filemask);
foreach(var file in files)
{
using var stream = fileSystem.OpenFile(file);
using var reader = new StreamReader(stream);
while (reader.ReadLine() is {} line)
{
if(line.Contains(searchitem))
{
fileswithsearchitem.Add(file + " : " + line);
found = true;
break;
}
}
if(found)
{
break;
}
}
}
foreach (string item in fileswithsearchitem)
{
Console.WriteLine(item);
}
}
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
DoStuff(fileSystem);
}
然后你可以像这样写一个自动化测试:
class DictionaryBackedDirectory : IDirectory
{
readonly IReadOnlyCollection<IDirectory> _directories;
readonly IReadOnlyCollection<string> _files;
public DictionaryBackedDirectory(
IReadOnlyCollection<IDirectory> directories,
IReadOnlyCollection<string> files)
{
_directories = directories;
_files = files;
}
public IEnumerable<IDirectory> EnumerateDirectories() => _directories;
public IEnumerable<string> EnumerateFiles(string mask) => _files; // TODO: implement masking
}
class DictionaryBackedFileSystem : IFileSystem
{
readonly IReadOnlyDictionary<string, IDirectory> _directories;
readonly IReadOnlyDictionary<string, Func<Stream>> _files;
public DictionaryBackedFileSystem(
IReadOnlyDictionary<string, IDirectory> directories,
IReadOnlyDictionary<string, Func<Stream>> files)
{
_directories = directories;
_files = files
}
public IDirectory OpenDirectory(string path) => _directories[path];
public Stream OpenFile(string path) => _files[path]();
}
void AutomatedTest()
{
var mockFileSystem = new DictionaryBackedFileSystem(
new Dictionary<string, IDirectory>()
{
[@"L:\machinelogs"] = new DictionaryBackedDirectory(
new Dictionary<string, IDirectory>(),
new string[]
{
@"L:\machinelogs\log1.log"
}
)
},
new Dictionary<string, Func<Stream>>()
{
[@"L:\machinelogs\log1.log"] = () => new MemoryStream() // TODO: populate the memory stream with data for the test
}
)
DoStuff(mockFileSystem);
}
这样做的好处:
- 提高可重用性
- 如果需要,您可以实现远程文件系统
- 使您的代码更易于测试
- 拥有“可单元测试的代码”有很多优势,并且拥有 可以被“ mock ”的抽象让你更接近那座金色的城市
更抽象地输出结果
您的代码不必绑定(bind)到 Console.WriteLine()
或特定的输出编码。
例如:
readonly struct Result
{
public readonly string Path;
public readonly string Line;
public Result(string path, string line)
{
Path = path;
Line = line;
}
}
IEnumerable<Result> DoStuff(IFileSystem fileSystem)
{
string SourcePath = @"L:\machinelogs";
string filemask = "*.log";
string searchitem = @"cannot access server data";
IDirectory directory = fileSystem.OpenDirectory(SourcePath);
IEnumerable<IDirectory> dirs = directory.EnumerateDirectories();
dirs.Append(directory);
foreach (var dir in dirs)
{
var files = dir.EnumerateFiles(filemask);
foreach(var file in files)
{
using var stream = fileSystem.OpenFile(file);
using var reader = new StreamReader(stream);
while (reader.ReadLine() is {} line)
{
if(line.Contains(searchitem))
{
yield return new Result(file, line)
}
}
}
}
}
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
foreach (var result in DoStuff(fileSystem))
{
Console.WriteLine(result.File + " : " + result.Line);
break; // Could easily change this to continue searching
}
}
看看这如何将控制台交互从您的代码中移出,使输出格式成为其他人的问题,并让您的代码的使用者决定他们是否想在您的搜索命中后继续搜索?
这也将使您的代码更接近可单元测试。如果不清楚原因,请随时询问。
注入(inject)参数
源路径、文件掩码和搜索项不必是硬编码常量。
例如:
IEnumerable<Result> DoStuff(
IFileSystem fileSystem,
string sourcePath,
string fileMask,
string searchItem)
{
IDirectory directory = fileSystem.OpenDirectory(sourcePath);
IEnumerable<IDirectory> dirs = directory.EnumerateDirectories();
dirs.Append(directory);
foreach (var dir in dirs)
{
var files = dir.EnumerateFiles(fileMask);
foreach(var file in files)
{
using var stream = fileSystem.OpenFile(file);
using var reader = new StreamReader(stream);
while (reader.ReadLine() is {} line)
{
if(line.Contains(searchItem))
{
yield return new Result(file, line)
}
}
}
}
}
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
foreach (var result in DoStuff(
fileSystem,
@"L:\machinelogs",
"*.log",
@"cannot access server data"
))
{
Console.WriteLine(result.File + " : " + result.Line);
break;
}
}
看看这如何让搜索其他东西成为可能?
使用Path.Combine
这将消除对 Windows 的一种依赖性——即反斜杠路径分隔符。
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
foreach (var result in DoStuff(
fileSystem,
Path.Combine("L:", "machinelogs"),
"*.log",
@"cannot access server data"
))
{
Console.WriteLine(result.File + " : " + result.Line);
break;
}
}
如果我上面的代码没有编译通过,我不会感到惊讶。这是即兴注销的
关于c# - 如何在每个目录或子目录中找到包含特定字符串的最新文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73426101/