pyspark.sql.
DataFrameWriter
Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this.
DataFrame
DataFrame.write
New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
Methods
bucketBy(numBuckets, col, *cols)
bucketBy
Buckets the output by the given columns.
csv(path[, mode, compression, sep, quote, …])
csv
Saves the content of the DataFrame in CSV format at the specified path.
format(source)
format
Specifies the underlying output data source.
insertInto(tableName[, overwrite])
insertInto
Inserts the content of the DataFrame to the specified table.
jdbc(url, table[, mode, properties])
jdbc
Saves the content of the DataFrame to an external database table via JDBC.
json(path[, mode, compression, dateFormat, …])
json
Saves the content of the DataFrame in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.
mode(saveMode)
mode
Specifies the behavior when data or table already exists.
option(key, value)
option
Adds an output option for the underlying data source.
options(**options)
options
Adds output options for the underlying data source.
orc(path[, mode, partitionBy, compression])
orc
Saves the content of the DataFrame in ORC format at the specified path.
parquet(path[, mode, partitionBy, compression])
parquet
Saves the content of the DataFrame in Parquet format at the specified path.
partitionBy(*cols)
partitionBy
Partitions the output by the given columns on the file system.
save([path, format, mode, partitionBy])
save
Saves the contents of the DataFrame to a data source.
saveAsTable(name[, format, mode, partitionBy])
saveAsTable
Saves the content of the DataFrame as the specified table.
sortBy(col, *cols)
sortBy
Sorts the output in each bucket by the given columns on the file system.
text(path[, compression, lineSep])
text
Saves the content of the DataFrame in a text file at the specified path.