Use Python or Shell to answer this challenge

144 views Asked by At

For this challenge, the task is to read a text file containing lists of filenames with extensions and determine which names are unique per row, ignoring the file extensions.

For example, consider the following text file:

foo.mp3|bar.txt|baz.mp3wub.mp3|wub.mp3|wub.mp3|wub.txt|wub.pngquux.mp3|quux.txt|thud.mp3

The expected output for this file is

foo.mp3|bar.txt|baz.mp3thud.mp3

After removing extensions, all three names are unique on line 1 so the entire line is unchanged.

However, after removing extensions on line two, all wub files aren't unique and therefore are not included in the output at all.

For line three, after removing extensions, files with the name quux are non-unique and are removed from the output. thud.mp3 is unique and is included in the output.Notes

Filenames in the text file are strictly alphanumeric with a single period. No paths are involved. The delimiter is always a pipe. Each line should be operated on independently from all others; no logic carries forward from line to line. Files won't be more than 500 lines and lines will never be longer than 100 characters.

I wasn't able to fix it using python.

My code:

def find_unique_filenames(text):
    result = []

    for line in text.split('\n'):
        unique_names = set()
        filenames = line.strip().split('|')

        for filename in filenames:
            name_without_extension = filename.split('.')[0]
            unique_names.add(name_without_extension)

        result_line = '|'.join(unique_names)
        result.append(result_line)

    return '\n'.join(result)

# Uncomment the next line if you want to test this module independently
# print(find_unique_filenames("foo.mp3|bar.txt|baz.mp3\nwub.mp3|wub.mp3|wub.mp3|wub.txt|wub.png\nquux.mp3|quux.txt|thud.mp3"))

3

There are 3 answers

0
Daweo On
unique_names = set()
...
unique_names.add(name_without_extension)

Such usage of set will result in repeated element resulting in single one, whilst you are expected to find elements which appear exactly once.

Consider following simple example if you are working with following data

A|B|C|C|C|D|E

if you are tasked with finding distinct elements answer is

A|B|C|D|E

if you are tasked with finding elements appearing exactly once answer is

A|B|D|E

You might use collections.Counter for counting elements.

0
Alain T. On

You could use the Counter class to count the prefixes (without extension) and then filter files on names where the counter is 1:

text = """foo.mp3|bar.txt|baz.mp3
wub.mp3|wub.mp3|wub.mp3|wub.txt|wub.png
quux.mp3|quux.txt|thud.mp3"""

from collections import Counter

for line in text.split("\n"):
    c = Counter(n.rsplit(".",1)[0] for n in line.split("|") )

    r = "|".join(n for n in line.split("|") if c[n.rsplit(".",1)[0]]==1)

    print(r)    

## foo.mp3|bar.txt|baz.mp3
##
## thud.mp3

You could also do this, albeit less efficiently, using the str.count function (as long as you ensure a "|" prefix for all values):

for line in text.split("\n"):

    r = "|".join(n for n in line.split("|") 
                   if ("|"+line).count("|"+n.rsplit(".",1)[0])==1)

    print(r)

As a middle-ground between using a library class and efficiency, you could use a dictionary to map the list of names for each prefix, then filter on lists of names that only have one item:

for line in text.split("\n"):

    prefix = dict()
    for name in line.split("|"):
        key = name.rsplit(".",1)[0]
        prefix.setdefault(key,[]).append(name)

    r = "|".join(v[0] for v in prefix.values() if len(v)==1)

    print(r)
0
Kaz On

TXR Lisp, in the interactive listener:

This is the TXR Lisp interactive listener of TXR 293.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
Reminder: your account balance of 37 closing parentheses is past due.
1> (each ((line (spl "\n" "foo.mp3|bar.txt|baz.mp3\n \
                           wub.mp3|wub.mp3|wub.mp3|wub.txt|wub.png\n \
                           quux.mp3|quux.txt|thud.mp3")))
     (flow line
       (spl "|")
       (group-by (op trim-right #/\.[^\/]+*/))
       hash-values
       (keep-if (opip len (eql 1)))
       (join-with "|")
       put-line))
foo.mp3|baz.mp3|bar.txt

thud.mp3
nil