Data Format
  • 03 Sep 2024
  • 3 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Data Format

  • Dark
    Light
  • PDF

Article summary

Once the File Format is selected, all the related configurations will appear.

Parquet

A parquet file is a type of column-oriented file format where the data in each column is stored independently. It contains a table-type format to store the data.


Delimited

A delimited file is a text file format where a delimiter, such as a comma, tab, pipe, etc, separates the data in each column.

Header Row: Allows to specify that the file’s first row data must be considered a header row. Data available in the header row separated by delimiter are column names.

Delimiter: Delimiter is a character separator that separates the values stored in rows. Allows to define the predefined and custom delimiter for the data. Select the required delimiter from the dropdown.

  • Comma: Allows to select a Comma (,) delimiter for the data.
  • Tab: Allows to select a Tab (     ) delimiter for the data.
  • Custom: Allows to define the custom delimiter. For Example: Semicolon, Colon, Pipe, Forward Slash.

Selection of Custom enables the Custom Delimiter.

Custom Delimiter: Allows to define a new custom delimiter for the data.

Selection of Custom from the Delimiter dropdown is mandatory to define the custom delimiter.

Escape Character

An Escape character and a Text Qualifier create a sequence recognized and ignored during parsing. Its purpose is to allow the use of escape sequences in the data that would otherwise be seen as delimiter occurrences. 

The character preceded by a backslash (\) is known as an escape character.

Custom Delimiter: Allows to define a new custom delimiter for the data.

The use of escape character minimizes the need to switch quotation marks for enclosing the strings, which contains special punctuation marks. So you can use any quotation marks to enclose your string and escape the collision that comes in the middle by using escape character. It also avoids the delimiter collision.

Text Qualifier

Text qualifier is used if delimiters are contained within the row cell. If the cell contains a delimiter and a text qualifier is not used, then the data that occurs after the delimiter will spill into the next column.

The Text Qualifiers are single (‘‘) or double (““) quotation marks around data elements to identify the element as text- or character-based data.

A text qualifier is a character used at the beginning and end of a field value. If the data has single or double quotation marks used to enclose the delimiter character or any special character, the same must be defined in the text qualifier.


JSON

A plain text file is written in the syntax of JSON

Supported Structure

  • The characteristic of supported JSON is one record per row. This is a widely used format for data ingestion.
    {"first_name": "Bradley", "priority": 1683, "subscribe": true, "income": 955289.05, "address": {"City": "Nicolestad", "State": "Massachusetts"}, "countries_visited": ["Turks and Caicos Islands", "Spain", "New Caledonia"], "date_of_birth": "1988-02-19 00:00:00", "null_key": null}
    {"first_name": "Jennifer", "priority": 2756, "subscribe": true, "income": 15248.17, "address": {"City": "Burnsborough", "State": "Idaho"}, "countries_visited": ["Mauritania", "Turkey", "Guinea"], "date_of_birth": "1994-08-31 00:00:00", "null_key": null}
    {"first_name": "Tyler", "priority": 2628, "subscribe": false, "income": 248173.49, "address": {"City": "Ericahaven", "State": "California"}, "countries_visited": ["Sudan", "Afghanistan", "Chad"], "date_of_birth": "1978-06-30 00:00:00", "null_key": null}
    {"first_name": "Lisa", "priority": 1518, "subscribe": false, "income": 338300.85, "address": {"City": "Tracyton", "State": "Oklahoma"}, "countries_visited": ["Honduras", "Samoa", "Congo"], "date_of_birth": "1991-08-06 00:00:00", "null_key": null}
    {"first_name": "William", "priority": 1714, "subscribe": false, "income": 950738.18, "address": {"City": "Lake Tina", "State": "Nevada"}, "countries_visited": ["Seychelles", "Vietnam", "Lebanon"], "date_of_birth": "1981-02-09 00:00:00", "null_key": null}

Not Supported Structure

  1. JSON files with formatted records (that span over multiple rows).
    {
       "first_name":"Rachel",
       "priority":2619,
       "subscribe":false,
       "income":435324.12,
       "address":{
          "City":"Smithstad",
          "State":"Michigan"
       },
       "countries_visited":[
          "Belize",
          "Eritrea",
          "Egypt"
       ],
       "date_of_birth":"1976-06-19 00:00:00",
       "null_key":null
    }
  2. JSON contains data in arrays.
    [{"id":1,"name":"John Doe","email":"john.doe@example.com"},{"id":2,"name":"Jane Doe","email":"jane.doe@example.com"},{"id":3,"name":"Mike Smith","email":"mike.smith@example.com"}]
    [{"id":7,"name":"Peter Green","email":"peter.green@example.com"},{"id":8,"name":"Susan Black","email":"susan.black@example.com"},{"id":9,"name":"Michael White","email":"michael.white@example.com"},{"id":10,"name":"Jessica Green","email":"jessica.green@example.com"}]

The selected file size must be less than 2MB. We recommend that the user must avoid uploading the Gzip file unless it has been decrypted.

The View Batch Schema button gets enabled after uploading the selected file.

View Source Schema: Allows you to view and modify the selection of source schema.


Source Data Format

All the available parent column names in the uploaded data file will appear here.

Filter: It allows you to search and filter the specific columns here. Scroll down manually to see the entire list.

Column list: As required, you can select or deselect the required columns of the left-hand side drawer. All the columns available in the source data format are selected by default.

Selection of at least one column is mandatory to save the schema in the batch listener configuration.
In the case of nested data structures, only the parent column is displayed on the left side.

JSON: JSON schema will update dynamically when columns/keys are selected or deselected from the left-hand side drawer. It displays the standardized schema structure that needs to be mapped in the project schema.

Users can only view the standardized schema structure.

  • After changes, copy the created entire schema and paste it into the defined project schema at the right-hand side drawer.
  • Click Save Changes to save the configuration.

Related Topics


Is it helpful? React and share your comment

What's Next
Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.
ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence