A feature list for Stencila Sheets


#1

@michael, @oliver and I have been planning the next six months of development and user testing on Stencila Sheets. We’re keen to get input from the community on which features should put in, and which we should leave out, and the relative priority for each.

The plan is to implement initial versions of these features over the coming months, ready for a first round of user testing in early November. Based on that feedback, we’ll know which features need more work, and potentially, which features should be dropped.

For background on Stencila Sheets check out last year’s blog posts:

We’ll link to some relevant Github issues below. We’d love any comments either on those specific issues, or right here in this forum topic!

-1. Conventional spreadsheet features that Stencila Sheets won’t have!

“Knowing what to leave out is just as important as knowing what to focus on.” - Warren Buffet

One of the fundamental principles of Stencila Sheets is to provide users an environment for conducting reproducible research within a interface that they are already familiar with - the spreadsheet. But we’re not just trying to create an open-source version of Excel - that already in exists e.g. LibreOffice Calc.

We want to create spreadsheet software that is built from the ground up for reproducibility and which addresses some of the shortcomings of using spreadsheets for data analysis. Part of that process is taking the great things about spreadsheets (e.g. a reactive programming environment) and leaving behind the things that lead to errors. That’s why this section is numbered -1.

We want to make Stencila Sheets more semantic and data-centric and less layout and formatting centric. So, some spreadsheet formatting features you won’t see in Stencila Sheets include:

  • font face and font size menu items
  • text and background colour menu items
  • cell border style, colour and width

By not allowing ad hoc formatting of cells we hope to focus the user on the types and structure of data and formulae in the Sheet. It also allows us to use styling attributes to visually distinguish different types of data and cell types. We do plan to have a conditional formatting feature but this will be very much data-driven (see below).

0. Conventional spreadsheet features that Stencila Sheets will have!

Continuing with our section numbering scheme, this section is numbered 0 because it’s about implementing the basic interactions and functionality that users expect in a spreadsheet. There is of course a long, long list of spreadsheet features but initially we’re focused on the basics of creating, updating, selecting and deleting cells and entering data and formulae. Over time we’ll be adding more of the advanced features that users expect from spreadsheets (e.g. plotting).

@oliver and @michael have already made good progress on this. Check out the screenshots - it’s basic functionality, but foundational to establishing a user interface that users are already familar with! And there’s a lot of thought gone into making it these foundational interactions efficient and extensible.

For all of these “conventional” features, we plan on sticking close to existing spreadsheet interface conventions. Microsoft Excel and Google Sheets have had a lot of user interaction research put into them and provide really good examples to follow. Users are already familiar with them and it seems wasteful and unnecessary to reinvent the wheel. Instead, we’ll put that energy into the novel features of Stencila Sheets that we think make them more suitable for reproducible research…

1. Harnessing the power of open-source languages for scientific computing

1a. Allowing users to use cell expressions in external languages

The initial idea that sparked Stencila Sheets was to take the model of the computational notebook - code cells embedded within a document - and apply it to the spreadsheet interface. Here’s an example of the first prototype of Stencila Sheets with cell expressions written in R:

We have overhauled the architecture of Stencila so that there is greater decoupling between the user interfaces and the code execution contexts. We also have a useful abstraction layer for passing data between execution contexts. These changes now allow us to have Sheets with cells in a variety of languages.

To allow cell expressions to be written in external languages such as R will require us to revisit our approach to processing of expressions to expand out cell ranges such as A1:A10. See: https://github.com/stencila/stencila/issues/334

A more advanced but potentially powerful feature related to external languages is “projecting” tabular data values onto the Sheet. For example, one group of columns might be generated by an SQL expression which extracts data from a database, another cell might fit a linear regression model to those data using R. We would need some sort of cell “projection” or “mapping” so that, in this example, if the SQL statement returned tabular data of 10 rows x 5 columns, it would be projected across the 50 adjacent cells. See: https://github.com/stencila/stencila/issues/118

1b. Allowing users to write functions in external languages

The other big change in Stencila in the last year is the introduction of Mini. Mini, as it’s name suggests, is a small, simple, purely functional language similar to Excel’s cell formula language.

Mini will allow users to define functions using external languages. This allows more complex code that does not easily fit into a cell to be edited and promotes resusability. It allows users to quickly wrap powerful functionality available in other languages and makes them accessible to other Stencila users. For example a Mini lasso function for Lasso Regression could be written using R’s glmnet package.

As one reader of our initial blog post suggested:

Functions should not ‘live’ in cells. Function definitions should simply be part of the spreadsheet’s code. The text-representation of the sheet would include not only a series of cell definitions (coordinates of cells and their contents) but proper functions as well. - https://thoughtstreams.io/joeld/problems-with-excel/10205/

It’s an idea @michael expanded on here: https://medium.com/@_mql/reproducible-science-for-people-who-dont-code-5850abe806db

For a discussion of implementation of this, see: https://github.com/stencila/stencila/issues/310

2. Strong typing of cells, columns and rows

One of the biggest problems with most spreadsheet implementations is “weak typing” - cells can contain any type of value. This combined with auto-conversion can lead to problems like the recently publicized curruption of gene data:

One of the great ideas that @oliver has come up with recently is adding strong typing to Sheets. Users could specify that a column, row, cell range or cell was of a particular type (e.g. date, string). If data was entered into the cell, or a cell formula returned a value, which did not conform to the specified type the cell would show an error. Cells with specified types could be visually distinguished (e.g. by colour, or small icon in the corner).

Strong typing would be optional but could be encouraged by providing users with a metric of the proportion of cells that were strongly typed as an indication of the sheet’s “robustness”.

3. Distinguish between static and dynamic cells

Distinguishing between cells that are static (a.k.a constant, data) and those that are dynamic (i.e. cells starting with an = sign, a.k.a. expressions, formulae) could also reduce the error rate of spreadsheets by clearly separating data from code. Again, this could be done by using colours or icons (the first prototype used a small equals sign in the top left).

4. Naming cells, columns and rows

Named cells and cell ranges is a feature available in Excel. It makes cell expressions more readable and less error prone. We want to extend this approach to the ability to name entire columns (and possibly rows). When combined with strong typing this would allow for sheets to be used to edit well defined tabular data, for example, conforming to the Frictionless Data Tabular Data Specification.

5. Clone cells

One of the biggest sources of errors with spreadsheets is when cell formulae are copy and pasted, or dragged into adjacent cells. Software like Excel will create copies of the formula with cell references updated, so that for example if cell B1 has the formula =A1*2, when it is copied into cell B2 the formula is updated to =A2*2. That’s great until you want to change the formula, then you have to remember to update all the copies as well - that can be error prone.

To solve that problem Ernő Zalka suggested the idea of “shadow” or “clone” cells: https://github.com/stencila/stencila/issues/170

6. Test cells

One often cited criticism of spreadsheets is that they are not testable. In our initial blog posts on Stencila Sheets we discussed how it is fairly straightforward to introduce testing to spreadsheets by allow cells to be marked as “test” cells.

These cells would have an expression which represented a test assertion e.g. = B1 >= 0 && B1 <= 100 to test that the percentage calculated in cell B1 is valid. In our initial prototype we indicated test cells using a ? mark and a tick or cross for pass/fail:

We also were able to generate test coverage statistics for the sheet.

7. Conditional formatting

Although we want to make Stencila Sheets more semantic we think there is potential value in data-driven, conditional formatting. Conditional formatting can allow easy identification of outlier data and out-of-expected-bounds expression values. It is in effect a form of data visualization.

Conditional formatting would not allow users to arbitrarily format cells. Instead they would need to specify a set of mappings (or “rules”, or “encodings”) between a cells value and it’s style (probably a restricted set of styling attributes e.g background colour, text colour, text weight). See: https://github.com/stencila/stencila/issues/97

Over to you!

Please give us your suggestions! It’s your chance to have a say on where Stencila Sheets is heading. Tell us which of these features you think are most important, which are unnecessary, and which need changing.

To help us prioritize, vote for the seven features that you think should be the highest priority (leaving two out). And reply to this post to give us your additional thoughts!

  • 1a Allowing users to write cell expressions in external languages
  • 1b Allowing users to write functions in external languages
  • 2 Strong typing of cells, columns and rows
  • 3 Distinguish between static and dynamic cells
  • 4 Naming cells, columns and rows
  • 5 Clone cells
  • 6 Test cells
  • 7 Conditional formatting
  • 8 Constraints (see below)

0 voters


Roadtrip Roundup! What we learned from 5 weeks on the road
Toads in Vancouver: using Stencila to teach SQL and R at UBC
#2

Hello everyone!
I had a few “hallelujah” moments when reading this post. In particular,

Yes, please. ASAP. People often rely on non-standard formatting choices to signal information to themselves. I am guilty of this myself, as one of the labs I where I worked did this as a matter of course. These formatting choices do not enrich the data, rather they make it more confusing to the outside viewer. Let any color in cells be standard and meaningful! Goodbye color and line weight choices. I will not shed a tear.

Yes please. Anything we can do to prevent auto-formatting tragedies will save researchers time and effort and make data reusable.

Yes please! Identifying static vs dynamic cells by eye will reduce clicking and make human error detection easier. I am all for it.


#3

I’ve just come across this paper published on PeerJ last week by Karl Broman​ and Kara Woo:

It’s sooo relevant to this discussion and well worth a read. To check that we’re heading in the right direction, I’m going to grab some excerpts from the paper, quote them here and make some suggestions on how we could design Stencila Sheets to encourage users to follow these recommendations.

At this stage we think it’s best for Stencila Sheets to retain the “scratchpad” flexibility of spreadsheets by having permissive defaults (e.g. weakly typed, unnamed cells). However, we may be able to encourage these sorts of practices, either through:

  • good practice auto-suggestions (e.g. "This column looks like it could be of type date" - urrrgh, flashbacks to Clippy!) and/or,
  • good practice metrics (e.g. “This sheet has 80% of data strongly typed and 90% test coverage of formulae”)

1. “Spreadsheets are best suited to data entry and storage”

Spreadsheets are often used as a multipurpose tool for data entry, storage, analysis, and visualization. Most spreadsheet programs allow users to perform all of these tasks, however we believe that spreadsheets are best suited to data entry and storage, and that analysis and visualization should happen separately. Analyzing and visualizing data in a separate program, or at least in a separate copy of the data le, reduces the risk of contaminating or destroying the raw data in the spreadsheet.

I agree that spreadsheets are good for data entry and storage. But they are also really good for small scale analyses and simulation - they allow you to easily see your data and do something with it quickly. I think the real strength of spreadsheets is the live, reactive programming model which is great for prototyping and understanding data analysis and simulation modelling.

Nonetheless, I do agree that spreadsheets which conflate data storage, data analysis and data-presentation are error prone. Having your data easily visible and being able to do some quick analysis on it in the same window is one of the powerful features of spreadsheets. But spreadsheets get error prone when it is hard to distinguish between data and analysis, and between data and formatting.

To address this, we propose to:

  • visually distinguishing between static (ie data) and dynamic (ie derived, calculated) cells
  • not allowing ad-hoc formatting

2. “Be consistent”

The first rule of data organization is be consistent. Whatever you do, do it consistently. Entering and organizing your data in a consistent way from the start will prevent you and your collaborators from having to spend time harmonizing the data later.

Broman & Woo suggest several ways to be consistent:

Use consistent codes for categorical variables (e.g. don’t mix M and male)

  • Related to the topic of strong typing above, we could allow user defined categorical types. For example, a column could have a sex type with the only valid values being male and female. The categorical types would be similar to factors in R.

  • There could be a library of types that users could add their own custom types (e.g. species names, countries) to. This could encourage consistency among users as well as within a sheet

Use a consistent fixed code for any missing values (e.g. NA instead of a blank cell)

  • We have a null type in our type system (which by the way is converted to a NA when passed to R). We could make it so that if the string “null” is typed into a cell that it actually means null, not the string “null”.

Use consistent variable names (e.g. don’t mix Glucose10wk and gluc10weeks)

  • For Mini we have been considering implementing a strict naming convention. At this stage we’re favouring the all_lower_case_underscore_separated convention. We want to avoid the situation that arises in many languages of multiple naming conventions (which happens a lot in R for example). Also, having consistent naming allows for some “smart defaults” for how data are presented e.g. consider this bit of Mini code and resulting plot from a Stencila Document (note the plot axes labels):

  • We could carry any naming convention from Mini across to cell, row and column names in Sheets to encourage consistent naming.

Use consistent subject identifiers (e.g. don’t mix 153, mouse153, mouse-153F)

  • This is similar to strongly typed categorical variables above but for identifiers or “keys” which may have an arbitrary number of different values.

  • We could allow users to define column or cell range constraints. This would allow for an arbitrary number of different values e.g. (mouse153, mouse154 …) but would constrain them to match a certain patterns (e.g. mouse\d+ for this string base variable).

  • The Frictionless Data Table Schema specification includes a model for specifying constraints including string patterns and number ranges

  • Constaints would be an attribute of cell, columns, rows in addition to typeing e.g a number column could have a list of constraints including minimum: 0, maximum: 1000

  • User defined, custom types could have constraints attached to them. For example, a mouse_weight type could be specified which extends the number type by having a constraint of a minimum value of 0.

Use a consistent data layout in multiple files

  • I think this would be difficult for us to address easily so should probably to be considered out of scope for the Stencila feature set - at least initially.

Use consistent file names

  • Again, out of scope initially, although this is something that we could revisit when defining a format/layout for a “reproducible bundle”

Use a consistent format for all dates

  • This is addressed by having a date type and allowing users to strongly type columns.

  • Sencila Sheets won’t do automatic conversion to dates (the thing that caused the oft-cited errors with gene data) but they will have an algorithm for converting date strings to dates (and then probably displaying them as YYYY-MM-DD)

Use consistent phrases in your notes

  • Could be addressed using constraints as per “Use consistent subject identifiers”, although less likely to be applied to notes because they re more freeform by their nature

Be careful about extra spaces within cells (e.g. “male” not "male ")

  • Addressed by stong typing and/or constraints

3. “Choose good names for things”

It is important to pick good names for things. This can be hard, and so it is worth putting some time and thought into it. As a general rule, don’t use spaces, either in variable names or file names.

Ah, the old naming problem. I completely agree with the recommendation of “The main principle in choosing names, whether for variables or for file names, is short,but meaningful. So not too short.” (although, I often seem to lean towards names that are too short for some peoples tastes).

Beyond the idea of having a strict naming convention (above) I don’t think there is a lot Stencila Sheets can do here so consider it out of scope (having a Clippy style auto-suggestion when you enter long names could be a bit creepy and very annoying).

4. Write dates as YYYY-MM-DD

When entering dates, we strongly recommend using the global “ISO 8601” standard, YYYY-MM-DD, such as 2013-02-27.

Addressed above under “Use a consistent format for all dates”

5. No empty cells

Fill in all cells. Use some common code for missing data. Not everyone agrees with us on this point (for example, White et al. (2013) state a preference for leaving cells blank), but we would prefer to have \NA" or even a hyphen in the cells with missing data, to make it clear that the data are known to be missing rather than unintentionally left blank.

We will allow blank cells in Stencil Sheets but not in strongly typed columns (or cells, or rows). For strongly typed cells, empty values would be converted to null. It would work like this:

  • you start a new sheet, all cells are blank,
  • you enter some data into a column leaving some cells empty for missing values
  • you click on the “type” tool to specify that the column is of type “number”
  • the empty cells automatically get converted to null values

6. “Put just one thing in a cell”

The cells in your spreadsheet should each contain one piece of data. Don’t put more than one thing in a cell

For example, you might have a column with “plate position” as “plate-well”, such as “13-A01”. It would be better to separate this into “plate” and “well” columns (containing “13” and “A01”)…

This is really important but I can’t think of an easy way that Sheets could address this beyond having strongly typed columns and encouranging their use (a user would think about which type to use for a “plate-well” column and realise that it would be better to split it into two).

7. “Make it a rectangle”

The best layout for your data within a spreadsheet is as a single big rectangle with rows corresponding to subjects and columns corresponding to variables. The first row should contain variable names, and please don’t use more than one row for the variable names.

Having named and strongly typed columns, combined with “good practices” metrics, could encourage users to use a rectangular layout for data.

For example, consider this example of a spreadsheet with a non-rectangular layout from Broman & Woo’s paper:

This would score 0% for the “proportion of data cells that are strongly typed” metric and 0% for the “proportion of columns that are named”. There might be red exclamation icons in the status bar to indicate poor performance on these metrics.

That would encourage the user to rearrange the spreadsheet to a rectangular format which would score 100% on both metrics:

Going beyond encouragement, we are considering having a version of Sheets, dubbed “Datatables”, that would only allow named, rectangular data - something that matches a specification like the Frictionless Data Tabular Data Resource and could be used to edit CSV files, database tables or even a R data frame. At this stage, we’re not sure having a separate interface for that is necessary.

8. “Create a data dictionary”

It is helpful to have a separate file that explains what all of the variables are. It is helpful if this is laid out in rectangular form, so that the data analyst can make use of it in analyses. Such a “data dictionary” might contain:

  • The exact variable name as in the data file
  • A version of the variable name that might be used in data visualizations
  • A longer explanation of what the variable means
  • The measurement units
  • Expected minimum and maximum values

This is recommendation about including metadata. In Sheets, we could allow meta data such as longer descriptions and measurement units to be applied directly to each column. An alternative, mechanism for adding meta-data to sheets could be via custom types that extend base types (as described above). For example, a user might specify a mouse_weight type which:

  • extends the number types,
  • has a text description of how it was collected,
  • measurement units of grams,
  • and a single constraint of a minimum value of 0

9. “No calculations in the raw data files”

Often, the Excel files that our collaborators send us include all kinds of calculations and
graphs. We feel strongly that your primary data file should contain just the data and
nothing else: no calculations, no graphs.

This could be encouraged by a (neutral) indicator in the status bar showing whether a Sheet has any formulae in it. If we go ahead with the Datatables interface, then it would not allow any formulae to be entered into cells - everything would have to be data. This would provide the clear distinction that Broman & Woo recommend between a “spreadsheet for data” and a “spreadsheet for analysis”

10. “Don’t use font color or highlighting as data”

You might be tempted to highlight particular cells with suspicious data, or rows that should be ignored. Or the font or font color might have some meaning. Instead, add another column with an indicator variable (for example, “trusted”, with values TRUE or FALSE).

Formatting-as-data won’t be possible in Sheets because ad-hoc formatting of cell’s won’t be possible. The reverse, data-reflected-in-formatting, will be possible through conditional formatting - we think it can be useful for quickly understanding your data and identifying suspicious values.

11. “Make backups”

Make regular backups of your data. In multiple locations. And consider using a formal version control system, like git, though it is not ideal for data files. If you want to get a bit fancy, maybe look at dat (https://datproject.org/)

This is outside of our current focus for Sheets - getting the user interface right. However, we do want to make it easy for users to store their Sheets wherever they want. And as it happens, we’ve been working on a Dat storage backend for Stencila:

12. “Use data validation to avoid errors”

Regarding the task of data entry, it is important to ensure that the process is as error-free and repetitive-stress-injury-free as possible. One useful tool for avoiding data entry errors is the “data validation” feature in Excel to control the type of data or the values that users can enter into a cell

This directly addressed by strong typing and/or constraints (see above)

13. “Save the data in plain text files”

Keep a copy of your data files in a plain text format, with comma or tab delimiters…this sort of non-proprietary file format does not and never will require any sort of special software. And CSV files are easier to handle in code.

Again, this is somewhat outside of our current focus on the user interface. But I agree that it’s really important for interoperability to be able to do this.

Indeed, the ability to store Sheets (including cell formulas) in plain text formats was something that people saw as useful in our initial prototype because it allows for better integration with version control tools like Git and Github. It’s something we’ll revisit in a few months but last year’s post A spreadsheet file format for humans might be of interest.

Summary

Many of Broman and Woo’s reccomendations are addressed by strong typeing, one the 7 features that we had in our original list. The main additional user interface features, identified here, that could help with these recommendations are:

  • constraints (a.k.a “data validation”)
  • “good practice” metrics (in addition to test coverage metrics described above)

Let us know what you think!


Roadtrip Roundup! What we learned from 5 weeks on the road
#4

Attached is a first mockup for a possible UI to respect column names and strict types. If the type is violated the cell is colored. Additionally a warn sign is displayed in the sticky column header (to make you aware something is wrong even if the problematic cell is out of the viewport.

This is just a start… more to come. Let me know what you think!


New Stencila Forum!
#5

Wow, this is incredibly comprehensive. I need some time to go into details, but just wanted to give respect for such amazing work and vision!


#6

While I’m thinking more and more about this data sheet vs. analysis sheet, I had the following idea:

  • we could start with a default sheet where cell types are set to ‘auto’ (much like Excel)
  • we display a data-quality idiciator (e.g. at the bottom of the sheet)
  • now the user starts to enter data in a correct form (and specifying column names + types)
  • with some metrics we can determine if a sheet qualifies as a “data package” and we could signal that with a green data quality indicator.

I think a lot of the problems with Excel could be solved with best practices. However these can be easier established if tools actively support the user. My thinking was this:

  • don’t limit the user from the beginning: often they may just want to do simple calculation sheet (auto cells are best suited for that)
  • ability to transform an untyped sheet into a data-sheet by meeting quality measures
  • one interface for both use-cases (so the user does not need to choose between a data table and a sheet interface)
  • we can warn the user in several places (e.g. when they want to export as CSV, our export dialog could make suggestions on what they could do to increase the quality of the data before exporting it)

Thoughts?


#7

@michael, totally agree, this is where my thinking was going too. You have expressed it well - default allows flexibility but quality metrics, displayed in a status bar, encourage good practices.

There has been quite a lot of work done by others on spreadsheet quality metrics. Much of this relates to detecting errors in formulas, rather than improving data structure, but there is probably a lot to be learned from these experiences and we might also include some of these formula-based quality metrics too. A quick smattering of some of the literature:


Development Update: Basic user interactions implemented for Stencila Sheets
#8

When responding to this poll (and it’s really hard to pick what to exclude) I’m personally really excited a few of the features. First, it’s all about the impact and potential of 1a and 1b - allowing users to write cell expressions and functions in external languages is a game-changer. This will allow me to use the capability of the language I want to use, and retain the easy to follow look of a spreadsheet for my colaborators. Second to that, test cells are a really exciting feature. And lastly, all the improvements that will bring clarity to the Sheet and reduce human error (strong typing, naming, and easily distinguishable dynamic/static cells).


#9

I have now voted and looks like I am the sole person who wants to clone cells - having had so many failures trying to copy formulae, this speaks to me! Happy to be voted down as I am not a big user of sheets and I am sure lots of people with more knowledge and experience should make the call, but I have voted for what I would like to see done. I do like the idea of being able to add external code - the 1a and 1b things are great.

Keep up the good work - people are watching and waiting!


#10

Sorry for my primitive thinking, I’m not very good at this topic to boldly state something, but I will try.

I think there is a lot of sense behind this statement:

Spreadsheets are best suited to data entry and storage

While I’m also not 100% agree with that, it could be some sort of guideline.
Let’s say we are not agree with this statement, but what is important here is that data and analysis should be separated.
Let’s look at this from reproducible science or reproducible computations. Why do we need this?
First, it is important to reproduce data analysis to test hypothesis by other members of science community or other collective mind, right?
Second, it is important to reuse that data in other studies (well, in some cases) or made other analysis to test the same hypothesis.
And this is a strike against things like mixing data and analysis in the same sheet. Yes, like in your screenshot ;).

I would see it more like a separation between source (raw) data and analysis, so you can’t destroy data and mix data with your interpretation.
By the way, there is a thing in excel-like products - it’s called pivot table and it’s very useful for some kind of analysis and it won’t destroy yours data.

On the other hand, all constraints is here to formalise some sort of approach which is good, but in the same time it could disappoint some users because of lack of flexibility.


#11

For my own research purposes (mainly dealing with flat files, representing behavioral data from human subjects), naming and cloning would make the biggest difference, especially when using as a collaborative tool with research staff.


#12

Hi,

I’m picking out parts of the original post to comment on, or say yay/nay:

we hope to focus the user on the types and structure of data and formulae in the Sheet. It also allows us to use styling attributes to visually distinguish different types of data and cell types.

+1 on this. I could see this leading to important differentiating UX from Stencila Sheets and normal spreadsheets.

The initial idea that sparked Stencila Sheets was to take the model of the computational notebook - code cells embedded within a document - and apply it to the spreadsheet interface

The gif with this comment shows code expressions written inline in cells. It seems to me an area for Stencila UX focus could be around a model editor interface for writing code cells. I definitely do not mean some full blown text editor, but I do mean breaking out of the tiny interfaces we are used to when writing expressions in spreadsheet software.

Strong typing

Yes yes yes. I enable data validation all the time in Google Sheets. This is crucial for what I think the target users of Stencila are, and it really well aligns with the work and interests I have at Open Knowledge International towards data quality etc.

Distinguish between static and dynamic cells

I can see the utility when authoring, but less so when just reading a sheet. Maybe best as an option that can be enabled? Thinking of myself, I definitely want to be able to interact with a sheet in terms of the data it shows me, and not always in terms of the computations that generate the data I see.

Naming cells, columns and rows

+1

Test cells

Interested to see this used! I think it is a great idea.

Spreadsheets are best suited to data entry and storage

It is easy to forget that a spreadsheet, in terms of the way any user interacts with one, is a visualisation, and data entry and storage is towards this purpose of visualisation. A table view on data is a visualisation of the data. To that end, I’d suggest, especially in the context of Stencila Sheets, that spreadsheets are best suited to data exploration and discovery, and analysis is a subset of this.

That is it from me for now. Looking forward to see what is prioritised next.


#13

So I’ve been thinking about this, and from my experience, the non-rectangular layout is what makes it easy for a human to understand what data they have and begin to analyse it. Changing this into the rectangular layout that is best practise obscures some of that visual understanding, for me at least.

Could Stencila allow data entry in a non-rectangular layout, and then have some transformations that create a rectangular layout for sharing and use in analysis?


#14

Interesting, I am just so used to a “tidy” tabular layout, that I prefer it for data entry and analysis. But I can see that in some contexts, a different layout would be preferable.

Your suggestion is to let users enter data into a sheet and then reshaping it into a tabular format - I suspect that this would be very difficult to implement (without bugs). An alternative, that was suggested by people at our recent workshops, is to have “Stencila Forms” - ie a user interface specifically designed for data entry (like Google Forms or FileMaker). The form’s fields would be automatically generated from the column types in the Sheet - so for example, if you had a column of type “Country” and another of type “Population” a form would get generated with a drop down for country codes and a number input. Somewhat different to your suggestion - but a lot more easy to implement :slight_smile:


#15

I know that colour cells is probably a feature further away in the roadmap but when that feature is implemented, could we have one way to show the user the source of the colour? For example, Google Sheets offers three ways to change the colour:

  1. Change style

  1. Conditional format

  1. Alternating colours

The command “Clean formatting” only remove 1 and user need to check 2 and 3. Would be great to have a panel showing which rule is being apply.


#16

Thanks @rgaiacs for the suggestion! Colouring cells is something that people often want to do in spreadsheets. But there is a danger in conflating formatting (e.g. fill colour, border colour, bold text etc) with data (see discussion above).

So we’d like to explore alternative ways for users to “sematically tag” cells instead of formatting. For example, formatting is often used to reflect cells which are more important (e.g. the sum of a column of values). Instead of using a format to indicate cell importance it may be better to tag them e.g. primary, important or have levels of importance like heading1, heading2 etc in documents. That would allow semantic interpretation of a Sheet, for example to extract the end result of calculations.


#17

I like the idea of add semantic to the cells. :+1: