final class DataFrameWriter[T] extends AnyRef
Interface used to write a Dataset to external storage systems (e.g. file systems,
key-value stores, etc). Use Dataset.write to access this.
- Annotations
- @Stable()
- Source
- DataFrameWriter.scala
- Since
- 1.4.0 
- Alphabetic
- By Inheritance
- DataFrameWriter
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        bucketBy(numBuckets: Int, colName: String, colNames: String*): DataFrameWriter[T]
      
      
      Buckets the output by the given columns. Buckets the output by the given columns. If specified, the output is laid out on the file system similar to Hive's bucketing scheme, but with a different bucket hash function and is not compatible with Hive's bucketing. This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark 2.1.0. - Annotations
- @varargs()
- Since
- 2.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        csv(path: String): Unit
      
      
      Saves the content of the DataFramein CSV format at the specified path.Saves the content of the DataFramein CSV format at the specified path. This is equivalent to:format("csv").save(path)You can find the CSV-specific options for writing CSV files in Data Source Option in the version you use. - Since
- 2.0.0 
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        format(source: String): DataFrameWriter[T]
      
      
      Specifies the underlying output data source. Specifies the underlying output data source. Built-in options include "parquet", "json", etc. - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        insertInto(tableName: String): Unit
      
      
      Inserts the content of the DataFrameto the specified table.Inserts the content of the DataFrameto the specified table. It requires that the schema of theDataFrameis the same as the schema of the table.- Since
- 1.4.0 
- Note
- Unlike ,- saveAsTable,- insertIntoignores the column names and just uses position-based resolution. For example:- SaveMode.ErrorIfExists and SaveMode.Ignore behave as SaveMode.Append in - insertIntoas- insertIntois not a table creating operation.- scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1") scala> Seq((3, 4)).toDF("j", "i").write.insertInto("t1") scala> Seq((5, 6)).toDF("a", "b").write.insertInto("t1") scala> sql("select * from t1").show +---+---+ | i| j| +---+---+ | 5| 6| | 3| 4| | 1| 2| +---+---+ - Because it inserts data to an existing table, format or options will be ignored. 
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        jdbc(url: String, table: String, connectionProperties: Properties): Unit
      
      
      Saves the content of the DataFrameto an external database table via JDBC.Saves the content of the DataFrameto an external database table via JDBC. In the case the table already exists in the external database, behavior of this function depends on the save mode, specified by themodefunction (default to throwing an exception).Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. JDBC-specific option and parameter documentation for storing tables via JDBC in Data Source Option in the version you use. - table
- Name of the table in the external database. 
- connectionProperties
- JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. "batchsize" can be used to control the number of rows per insert. "isolationLevel" can be one of "NONE", "READ_COMMITTED", "READ_UNCOMMITTED", "REPEATABLE_READ", or "SERIALIZABLE", corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of "READ_UNCOMMITTED". 
 - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        json(path: String): Unit
      
      
      Saves the content of the DataFramein JSON format ( JSON Lines text format or newline-delimited JSON) at the specified path.Saves the content of the DataFramein JSON format ( JSON Lines text format or newline-delimited JSON) at the specified path. This is equivalent to:format("json").save(path)You can find the JSON-specific options for writing JSON files in Data Source Option in the version you use. - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        mode(saveMode: String): DataFrameWriter[T]
      
      
      Specifies the behavior when data or table already exists. Specifies the behavior when data or table already exists. Options include: - overwrite: overwrite the existing data.
- append: append the data.
- ignore: ignore the operation (i.e. no-op).
- erroror- errorifexists: default option, throw an exception at runtime.
 - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        mode(saveMode: SaveMode): DataFrameWriter[T]
      
      
      Specifies the behavior when data or table already exists. Specifies the behavior when data or table already exists. Options include: - SaveMode.Overwrite: overwrite the existing data.
- SaveMode.Append: append the data.
- SaveMode.Ignore: ignore the operation (i.e. no-op).
- SaveMode.ErrorIfExists: throw an exception at runtime.
 The default option is ErrorIfExists.- Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        option(key: String, value: Double): DataFrameWriter[T]
      
      
      Adds an output option for the underlying data source. Adds an output option for the underlying data source. All options are maintained in a case-insensitive way in terms of key names. If a new option has the same key case-insensitively, it will override the existing option. - Since
- 2.0.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        option(key: String, value: Long): DataFrameWriter[T]
      
      
      Adds an output option for the underlying data source. Adds an output option for the underlying data source. All options are maintained in a case-insensitive way in terms of key names. If a new option has the same key case-insensitively, it will override the existing option. - Since
- 2.0.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        option(key: String, value: Boolean): DataFrameWriter[T]
      
      
      Adds an output option for the underlying data source. Adds an output option for the underlying data source. All options are maintained in a case-insensitive way in terms of key names. If a new option has the same key case-insensitively, it will override the existing option. - Since
- 2.0.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        option(key: String, value: String): DataFrameWriter[T]
      
      
      Adds an output option for the underlying data source. Adds an output option for the underlying data source. All options are maintained in a case-insensitive way in terms of key names. If a new option has the same key case-insensitively, it will override the existing option. - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        options(options: Map[String, String]): DataFrameWriter[T]
      
      
      Adds output options for the underlying data source. Adds output options for the underlying data source. All options are maintained in a case-insensitive way in terms of key names. If a new option has the same key case-insensitively, it will override the existing option. - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        options(options: Map[String, String]): DataFrameWriter[T]
      
      
      (Scala-specific) Adds output options for the underlying data source. (Scala-specific) Adds output options for the underlying data source. All options are maintained in a case-insensitive way in terms of key names. If a new option has the same key case-insensitively, it will override the existing option. - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        orc(path: String): Unit
      
      
      Saves the content of the DataFramein ORC format at the specified path.Saves the content of the DataFramein ORC format at the specified path. This is equivalent to:format("orc").save(path)ORC-specific option(s) for writing ORC files can be found in Data Source Option in the version you use. - Since
- 1.5.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        parquet(path: String): Unit
      
      
      Saves the content of the DataFramein Parquet format at the specified path.Saves the content of the DataFramein Parquet format at the specified path. This is equivalent to:format("parquet").save(path)Parquet-specific option(s) for writing Parquet files can be found in Data Source Option in the version you use. - Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        partitionBy(colNames: String*): DataFrameWriter[T]
      
      
      Partitions the output by the given columns on the file system. Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme. As an example, when we partition a dataset by year and then month, the directory layout would look like: - year=2016/month=01/
- year=2016/month=02/
 Partitioning is one of the most widely used techniques to optimize physical data layout. It provides a coarse-grained index for skipping unnecessary data reads when queries have predicates on the partitioned columns. In order for partitioning to work well, the number of distinct values in each column should typically be less than tens of thousands. This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark 2.1.0. - Annotations
- @varargs()
- Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        save(): Unit
      
      
      Saves the content of the DataFrameas the specified table.Saves the content of the DataFrameas the specified table.- Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        save(path: String): Unit
      
      
      Saves the content of the DataFrameat the specified path.Saves the content of the DataFrameat the specified path.- Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        saveAsTable(tableName: String): Unit
      
      
      Saves the content of the DataFrameas the specified table.Saves the content of the DataFrameas the specified table.In the case the table already exists, behavior of this function depends on the save mode, specified by the modefunction (default to throwing an exception). WhenmodeisOverwrite, the schema of theDataFramedoes not need to be the same as that of the existing table.When modeisAppend, if there is an existing table, we will use the format and options of the existing table. The column order in the schema of theDataFramedoesn't need to be same as that of the existing table. UnlikeinsertInto,saveAsTablewill use the column names to find the correct column positions. For example:scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1") scala> Seq((3, 4)).toDF("j", "i").write.mode("append").saveAsTable("t1") scala> sql("select * from t1").show +---+---+ | i| j| +---+---+ | 1| 2| | 4| 3| +---+---+ In this method, save mode is used to determine the behavior if the data source table exists in Spark catalog. We will always overwrite the underlying data of data source (e.g. a table in JDBC data source) if the table doesn't exist in Spark catalog, and will always append to the underlying data of data source if the table already exists. When the DataFrame is created from a non-partitioned HadoopFsRelationwith a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.- Since
- 1.4.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        sortBy(colName: String, colNames: String*): DataFrameWriter[T]
      
      
      Sorts the output in each bucket by the given columns. Sorts the output in each bucket by the given columns. This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark 2.1.0. - Annotations
- @varargs()
- Since
- 2.0 
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        text(path: String): Unit
      
      
      Saves the content of the DataFramein a text file at the specified path.Saves the content of the DataFramein a text file at the specified path. The DataFrame must have only one column that is of string type. Each row becomes a new line in the output file. For example:// Scala: df.write.text("/path/to/output") // Java: df.write().text("/path/to/output") The text files will be encoded as UTF-8. You can find the text-specific options for writing text files in Data Source Option in the version you use. - Since
- 1.6.0 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()