### Please describe the problem. I have a "person" attribute with the name of the people on the picture. I could create views and filter on that some time ago, but it looks like git annex now chokes on some illegal (accented?) characters. I have non-ASCII characters in names for sure. ### What steps will reproduce the problem? Try to create a view using the "person" attribute. ### What version of git-annex are you using? On what operating system? 5.20150617-gac09445 on ArchLinux ### Please provide any additional information below. [[!format sh """ % git annex view person="*" /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) view (searching...) git-annex: fd:14: commitBuffer: invalid argument (invalid character) failed git-annex: view: 1 failed % git annex version /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) git-annex version: 5.20150617-gac09445 build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E MD5E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 MD5 WORM URL remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external local repository version: 5 supported repository version: 5 upgrade supported from repository versions: 0 1 2 4 """]] > I'm assuming the setlocale part of this is a misconfigured system locale; > as also seen by an arch linux user in > > > So, disregarding that part of the bug report, we still have the actual > failure. > > With LANG=C, setting and getting metadata like "Rondò Veneziano" fails, > as does generating views of that metadata. > > In all cases, it's an IO encoding failure, "commitBuffer: invalid argument (invalid character)" > > This only occurs when there's a space in the metadata; in this case the > value is base64ed. While the 'ò' comes back out as "\242", which is the right > character, it's not encoded using the filesystem encoding. This means that > the IO layer can't handle it, when not in a unicode locale. Instead, it > needs to come back out as "\56515\56498". > > Apparently this is a reversion; it worked in an earlier version of > git-annex. Commits such as 9b93278e8abe1163d53fbf56909d0fe6d7de69e9 > or the conversion to Sandi may have caused the reversion, unsure. > > Fix is to apply the filesystem encoding when decoding base64ed values. > [[done]] --[[Joey]]