I need to generate file names from user inputted names. These names could be in any language. For example:
- "John Smith"
- "محمد سعيد بن عبد العزيز الفلسطيني"
These are use inputted values, so I have no guarantee that the names don't contain characters that are invalid to be in file names.
Users will be downloading these files from their browser, so I need to ensure the file names are valid on all operating systems in all configurations.
I am currently doing this for English speaking countries by simply removing all non-alphanumeric characters with a simple regex:
string = string.replaceAll("[^a-zA-Z0-9]", "");
string = string.replaceAll("\\s+", "_")
Some example conversions:
- "John Smith" -> "John_Smith.ext"
- "John O'Henry" -> "John_OHenry.ext"
- "John van Smith III" -> "John_van_Smith_III.ext"
Obviously this does not work internationally.
I've considered finding/generating a blacklist of all characters that are invalid on all file systems and stripping those from the names. I've been unable to find a comprehensive list.
I'd prefer to use existing code in a common library if possible. I imagine this is an already solved problem, however I can't find a solution that works internationally.
The filename is for the user downloading the file, not for me. I'm not going to be storing these files. These files are dynamically generated by the server upon request from data in a database. The filenames are for the convenience of the person downloading the file.