Introduction
Filtering files in a directory by extension, name pattern, size, or modification date is a common task across all programming languages. Python offers glob, pathlib, and os.listdir with filtering. JavaScript (Node.js) provides fs.readdir with manual filtering or glob packages. Bash uses shell globbing and find. Each approach trades off between simplicity (glob patterns) and flexibility (manual filtering with full stat access). The key decision is whether you need recursive traversal, and whether you are filtering by name pattern or file metadata.
Python: glob and pathlib
1import glob
2from pathlib import Path
3
4# glob — filter by extension
5csv_files = glob.glob('/data/*.csv')
6print(csv_files)
7# ['/data/sales.csv', '/data/users.csv']
8
9# Recursive with **
10all_py = glob.glob('/project/**/*.py', recursive=True)
11
12# pathlib — modern, object-oriented approach
13data_dir = Path('/data')
14
15# Filter by extension
16csv_files = list(data_dir.glob('*.csv'))
17
18# Recursive
19all_py = list(Path('/project').rglob('*.py'))
20
21# Filter by multiple extensions
22images = [f for f in data_dir.iterdir()
23 if f.suffix.lower() in ('.png', '.jpg', '.gif')]
24
25# Filter by size (files over 1MB)
26large_files = [f for f in data_dir.iterdir()
27 if f.is_file() and f.stat().st_size > 1_000_000]
28
29# Filter by modification time (last 24 hours)
30import time
31cutoff = time.time() - 86400
32recent = [f for f in data_dir.iterdir()
33 if f.is_file() and f.stat().st_mtime > cutoff]
Python: os.listdir and fnmatch
1import os
2import fnmatch
3
4directory = '/var/log'
5
6# os.listdir + list comprehension
7log_files = [f for f in os.listdir(directory) if f.endswith('.log')]
8
9# fnmatch for Unix-style pattern matching
10matching = [f for f in os.listdir(directory)
11 if fnmatch.fnmatch(f, 'app-*.log')]
12
13# os.scandir — more efficient (includes file type info)
14with os.scandir(directory) as entries:
15 files_only = [e.name for e in entries
16 if e.is_file() and e.name.endswith('.log')]
17
18# os.walk for recursive traversal
19for root, dirs, files in os.walk('/project'):
20 for name in files:
21 if name.endswith('.py'):
22 print(os.path.join(root, name))
os.scandir is preferred over os.listdir because it returns DirEntry objects that cache file type and stat information, avoiding extra system calls.
JavaScript (Node.js)
1const fs = require('fs');
2const path = require('path');
3
4// Basic filtering
5const dir = '/data';
6const csvFiles = fs.readdirSync(dir)
7 .filter(file => file.endsWith('.csv'));
8
9// With file stats
10const largeFiles = fs.readdirSync(dir)
11 .filter(file => {
12 const stats = fs.statSync(path.join(dir, file));
13 return stats.isFile() && stats.size > 1_000_000;
14 });
15
16// Async version
17async function getFilteredFiles(dir, ext) {
18 const entries = await fs.promises.readdir(dir, { withFileTypes: true });
19 return entries
20 .filter(entry => entry.isFile() && entry.name.endsWith(ext))
21 .map(entry => entry.name);
22}
23
24// Recursive with fs.readdir (Node 18+)
25async function findFiles(dir, pattern) {
26 const entries = await fs.promises.readdir(dir, {
27 withFileTypes: true,
28 recursive: true,
29 });
30 return entries
31 .filter(e => e.isFile() && e.name.match(pattern))
32 .map(e => path.join(e.parentPath, e.name));
33}
Bash
1# Shell globbing — files matching a pattern
2ls /data/*.csv
3
4# Find with extension filter
5find /data -name "*.csv" -type f
6
7# Find with multiple extensions
8find /data -type f \( -name "*.jpg" -o -name "*.png" \)
9
10# Find by size (files over 1MB)
11find /data -type f -size +1M
12
13# Find by modification time (last 24 hours)
14find /data -type f -mtime -1
15
16# Find with maxdepth (no recursion)
17find /data -maxdepth 1 -name "*.log" -type f
18
19# Combine filters
20find /project -type f -name "*.py" -size +10k -mtime -7
21
22# Pipe to other commands
23find /data -name "*.csv" -type f | xargs wc -l
C# (.NET)
1using System.IO;
2
3// Filter by extension
4string[] csvFiles = Directory.GetFiles("/data", "*.csv");
5
6// Search recursively
7string[] allPy = Directory.GetFiles("/project", "*.py",
8 SearchOption.AllDirectories);
9
10// EnumerateFiles — lazy evaluation (better for large directories)
11var largeFiles = new DirectoryInfo("/data")
12 .EnumerateFiles()
13 .Where(f => f.Length > 1_000_000)
14 .Select(f => f.FullName)
15 .ToList();
16
17// Multiple filters
18var images = Directory.EnumerateFiles("/photos")
19 .Where(f => new[] { ".jpg", ".png", ".gif" }
20 .Contains(Path.GetExtension(f).ToLower()));
Common Pitfalls
Not handling permission errors: Directories may contain files or subdirectories you lack read access to. os.walk, find, and recursive readdir will throw errors on inaccessible paths. Wrap traversal in try-catch or use find -readable / os.scandir error handling.
Using os.listdir instead of os.scandir: os.listdir returns only names, requiring separate os.stat calls for each file to check size or type. os.scandir returns DirEntry objects with cached metadata, making it significantly faster for filtered listings.
Glob patterns not matching hidden files: By default, glob patterns like * do not match files starting with . (dotfiles). In Python, use glob.glob('.*') explicitly or pathlib.Path.iterdir() which includes hidden files. In bash, enable shopt -s dotglob.
Case-sensitive extension matching: *.CSV does not match data.csv on case-sensitive file systems (Linux). Normalize with .lower() in Python, .toLowerCase() in JavaScript, or use find -iname in bash for case-insensitive matching.
Loading entire directory listing into memory: os.listdir and Directory.GetFiles load all entries into a list at once. For directories with millions of files, use lazy iterators: os.scandir, Path.iterdir(), Directory.EnumerateFiles, or streaming readdir in Node.js.
Summary
Python: use pathlib.Path.glob() or rglob() for pattern matching, iterdir() with list comprehensions for custom filters
Node.js: use fs.readdirSync with .filter() or fs.promises.readdir with { withFileTypes: true } for async
Bash: use find for flexible filtering by name, size, time, and type
Prefer lazy iterators (os.scandir, EnumerateFiles) over eager list-loading for large directories
Always handle permission errors and case sensitivity when filtering by name