Lesen / Schreiben von Dateien, die von hdfs mit python subprocess -, Rohr -, Popen Fehler gibt

Ich versuche zu Lesen(öffnen) und schreiben von Dateien im hdfs innerhalb eines python-Skript. Aber mit Fehler. Kann mir jemand sagen, was falsch ist hier.

Code (voll): sample.py

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

print "After Loop 2"
for line in cat.stdout:
    line += "Blah"
    print line
    print "Inside Loop"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

Wenn ich ausführen :

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

Führt es richtig ich konnte nicht suchen Sie die Datei, die schaffen sollen, in hdfs modifiedfile

Und Wenn ich ausführen :

 hadoop fs -getmerge ./fileRead/ file.txt

Innerhalb der file.txt ich habe :

Before Loop 
Before Loop 
After Loop 1    
After Loop 1    
After Loop 2    
After Loop 2

Kann mir bitte jemand sagen, was mache ich hier falsch ?? Ich glaube, es liest aus der sample.txt

InformationsquelleAutor | 2015-01-25

1

Versuchen, ändern Sie Ihre put sub-Prozess zu nehmen, die cat stdout auf seine eigene, indem Sie diese
```
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)
```
in diesem
```
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)
```
Vollständige Skript:
```
#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)
put.communicate()
```
- Ihre Antwort gibt mir die gleichen Inhalte in der sample.txt in hadoop fs -getmerge ./fileRead/ file.txt Während ich nicht finden konnte, eine Datei namens modifiedfile.txt in hdfs. Muss ich eine Datei erstellen namens modifiedfile in hdfs?? Ich dachte, dies erstellt eine neue Datei mit dem Namen modifiedfile in hdfs. Bin ich falsch? Aber Wie mache ich, Wenn Sie wollen, etwas zu schreiben, um die Ausgabe-Datei ?
- Wie können remote-hdfs angeschlossen werden?
InformationsquelleAutor dopstar
0

Kann mir bitte jemand sagen, was mache ich hier falsch ??

Ihre sample.py vielleicht nicht eine richtige mapper. Ein mapper wohl akzeptiert seine Eingabe auf stdin und schreibt das Ergebnis auf seine stdout z.B. blah.py:
```
#!/usr/bin/env python
import sys

for line in sys.stdin: # print("Blah\n".join(sys.stdin) + "Blah\n")
    line += "Blah"
    print(line)
```
Verwendung:
```
$ hadoop ... -file ./blah.py -mapper './blah.py' -input sample.txt -output fileRead
```
InformationsquelleAutor jfs

Schreibe einen Kommentar

Du musst angemeldet sein, um einen Kommentar abzugeben.